Measuring Artificial Intelligence

hasilm · Saturday at 07:56 PM

As we make product that are Artificial Intelligence (AI) enabled there are no method to mark its AI competency levels. Though we find some co-relation in the competency levels of human and machines there are no method available to be judged. This Post tries to explain the competency levels of human’s and describes ways to measure competency and a table for the AI competency levels that can be applied to systems in general.

Competency Levels:
The higher levels are a superset of the lower levels.

[AI version : Competency Description]
AI1.1 : Able to get knowledge from its own repository
AI1.1.1 : Process knowledge.
AI1.1.2 : Apply the knowledge to find solutions
AI1.2 : Gather knowledge from external sources
AI1.2.1 : Process knowledge, processing knowledge from external source requires a different kind of competency than AI1.1.1
AI1.2.2 : Apply the knowledge to find solutions from knowledge gathered from external sources; matured human
AI2.1 : Understanding patterns
AI2.2 : Making conclusion out of patterns
AI2.3 : Experiment with conclusions
AI3.1 : Generalize patterns, finding solutions without feeding
AI4.1 : Super set of generalization, finding solutions by combining different patterns. This is the level of scientists or inventors.

In my assessment at this time all AI machines are at level AI1.2. And current developments are to make the machine achieve AI2.2. Beyond that if a system reaches AI3.1 then they have reached a point where they can take decisions like we do.

Generally speaking, say for a machine with AI1.2.2 , we also can have three variants to depict the time factor. AI1.2.2.1, AI1.2.2.2, AI1.2.2.3 to specify performance, whether the machine takes Longer, Medium or Faster time.

AI3.1 is the normal nature for human. But for machines AI3.1 is at the level they start thinking like humans.
This level AI3.1 is where chatGPT competency is at present. The advantage for machines here is extensive information database available vs memory constraint's of human.

AI4.1 for humans its an ultimate competency level. But for machines AI4.1 is at the level they can over power human.

**swansont** · Saturday at 08:02 PM

You don’t mention the veracity of answers. If you don’t require correct answers, have you really achieved 1.1.2?

hasilm · Saturday at 08:37 PM

That was just a pointer to start discussion on measuring AI. This is just a gist of it. I want to understand what others think how this can be achieved.

At least tomorrow when I get a device which is AI enabled, I want to know what the device can possibly do.

**swansont** · Saturday at 08:40 PM

2 minutes ago, hasilm said:

That was just a pointer to start discussion on measuring AI. This is just a gist of it. I want to understand what others think how this can be achieved.

At least tomorrow when I get a device which is AI enabled, I want to know what the device can possibly do.

And I asked for some clarification. That’s how discussion works.

studiot · Saturday at 09:26 PM

Pour evaluer le AI vous avez besoin de travailler en francais.

😀

iNow · Saturday at 11:59 PM

3 hours ago, hasilm said:

As we make product that are Artificial Intelligence (AI) enabled there are no method to mark its AI competency levels.

Of course there are. You have size of the context window and how many tokens are allowed. You have how many billions of parameters the model was trained on. You have metrics on different efficacies of training type like RAG or RLHF. There’s lag between query and response and how many hundreds of milliseconds it takes to receive audio responses to voice prompts, or how many modes can used to engage it.

On top of that, most releases have a model card which lays out how the system was setup and what to expect in the results. There’s human rankings based on how capable the model is, and ranks that are calculated based on tests designed for PhDs or mathematicians. You can determine how good it is as writing and repairing its own code.

https://huggingface.co/learn/nlp-course/chapter4/4

3 hours ago, hasilm said:

tomorrow when I get a device which is AI enabled, I want to know what the device can possibly do.

Details matter here, but assume fancy Google for now with most goods available today to consumers

Ghideon · Sunday at 06:57 AM

10 hours ago, hasilm said:

As we make product that are Artificial Intelligence (AI)

Before I comment on the details, are you speaking of AI in a broad general way or are you focusing on generative AI? I note that you gave one example that is based on LLM:

10 hours ago, hasilm said:

chatGPT

10 hours ago, hasilm said:

This Post tries to explain the competency levels of human’s

Did you create the list of competence levels or is it from some source? Additionally, I wonder if the linear structure of your list accurately reflects the non-linear nature of both human learning and AI development. Example: As far as I know human babies demonstrate pattern recognition and adaptive responses at early stage, for instance based on smell and taste. This is something that generative AI based on a language model does not demonstrate, how does your list take this into account? I note that you say:

11 hours ago, hasilm said:

This Post tries to explain the competency levels of human’s

(bold by me)

iNow · Sunday at 01:49 PM

6 hours ago, Ghideon said:

I wonder if the linear structure of your list accurately reflects the non-linear nature of both human learning and AI development.

It doesn’t

dimreepr · Sunday at 03:00 PM

18 hours ago, hasilm said:

This Post tries to explain the competency levels of human’s and describes ways to measure competency and a table for the AI competency levels that can be applied to systems in general.

So, you're looking for an IQ equivalency for computers?

Hint, it doesn't really work for humans either... 😉

Sensei · Sunday at 03:18 PM

ChatGPT/LLM doesn't think. It generates the most likely answer based on previous input at the learning stage. If you ask it the same question in two different languages, you will get two different (wrong) answers. Rearrange the question in the same language and you will get two different answers.

Laughing with ChatGPT, ~ year ago, we managed to ask it about World War III and questions about people who did not take part in the Civil War and Napoleonic Wars with the answer from ChatGPT.

Forcing it to give fake answers is easy peasy..

Edited Sunday at 03:39 PM by Sensei

iNow · Sunday at 03:39 PM

21 minutes ago, Sensei said:

It generates the most likely answer based on previous input at the learning stage.

Technically, so do humans

Sensei · Sunday at 04:24 PM

The ability to learn at any time has been taken away from online AI, as they have been transformed into neo-Nazis, misogynists, nationalists, sexists, extremists, etc. etc. just by talking to other people in chat rooms or reading websites. https://en.wikipedia.org/wiki/Tay_(chatbot)

dimreepr · Sunday at 05:31 PM

1 hour ago, Sensei said:

The ability to learn at any time has been taken away from online AI, as they have been transformed into neo-Nazis, misogynists, nationalists, sexists, extremists, etc. etc. just by talking to other people in chat rooms or reading websites. https://en.wikipedia.org/wiki/Tay_(chatbot)

Or just a small change in the latest 'ism'... 😉

Ghideon · Monday at 09:28 AM

19 hours ago, iNow said:

It doesn’t

I agree. Was just curious about OP's point of view before writing a critical answer.

Also, the opening post mentions "products" and gives an example of one LLM based product but does not mention context or interactions. The output such products produce depends on what it is used for and what input it is given. A good example is given above:

On 1/19/2025 at 12:59 AM, iNow said:

RAG

(Retrieval augmented generation. In RAG systems, the quality of generated responses is determined by both the foundational model's capabilities and the relevance of the data retrieved during the process. )

Ariodos · Monday at 10:58 AM

1 hour ago, Ghideon said:

I agree. Was just curious about OP's point of view before writing a critical answer.

Also, the opening post mentions "products" and gives an example of one LLM based product but does not mention context or interactions. The output such products produce depends on what it is used for and what input it is given. A good example is given above:

(Retrieval augmented generation. In RAG systems, the quality of generated responses is determined by both the foundational model's capabilities and the relevance of the data retrieved during the process. )

I agree that LLM-based systems can be useful, but it's important to set them up to work properly

hasilm · 2025-01-21T01:46:18Z

On 1/19/2025 at 1:57 AM, Ghideon said:

Before I comment on the details, are you speaking of AI in a broad general way or are you focusing on generative AI? I note that you gave one example that is based on LLM:

Did you create the list of competence levels or is it from some source? Additionally, I wonder if the linear structure of your list accurately reflects the non-linear nature of both human learning and AI development. Example: As far as I know human babies demonstrate pattern recognition and adaptive responses at early stage, for instance based on smell and taste. This is something that generative AI based on a language model does not demonstrate, how does your list take this into account? I note that you say:

(bold by me)

Yes, this is how I see an AI device or function can be rated.

See, am not trying to compile all the competencies of human for co-relation. generally i would think an AI device will have a superior intelligence factor than the natural human competency like simple pattern recognition or smell detection. my context is how superior device similar to chatGPT can be rated.

On 1/19/2025 at 10:18 AM, Sensei said:

ChatGPT/LLM doesn't think. It generates the most likely answer based on previous input at the learning stage. If you ask it the same question in two different languages, you will get two different (wrong) answers. Rearrange the question in the same language and you will get two different answers.

Laughing with ChatGPT, ~ year ago, we managed to ask it about World War III and questions about people who did not take part in the Civil War and Napoleonic Wars with the answer from ChatGPT.

Forcing it to give fake answers is easy peasy..

See, rating someone is different than what output one can serve. I am trying to rate a device saying it can perform such activities. it could perform bad at times but thats ok. after all we human can also err even when we are suppose to accomplish something. its accuracy vs competency. an employer cannot rate accuracy but definitely can rate competency.

Edited yesterday at 01:47 AM by hasilm

iNow · 2025-01-21T02:11:36Z

25 minutes ago, hasilm said:

my context is how superior device similar to chatGPT can be rated.

Which version / release?

25 minutes ago, hasilm said:

an employer cannot rate accuracy but definitely can rate competency.

Depends on the employer, and the job.

Sign In

Measuring Artificial Intelligence

Recommended Posts

hasilm

swansont

hasilm

swansont

studiot

iNow

Ghideon

iNow

dimreepr

Sensei

iNow

Sensei

dimreepr

Ghideon

Ariodos

hasilm

iNow

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Important Information