In this post I want to provide a practical use of the AI Vector search functionality in the Oracle database. The use case is to be able to use AI Vector search on data that Oracle Enterprise Manager (EM) produces. In this case incidents data. The idea is to be able to search incidents using vectors in Oracle.
But what is Vector search and why we want to use it? 
Vector search is a feature that enables searching data by semantics or meanings and values. We want to use it in combination with an LLM in order to ask questions about our data. What kind of data? In this case EM’s incident data.
There are several challenges that we’ll have to work on in order to be able to use AI Vector search. The first challenge is that AI Vector search is a feature of the 26AI (former 23AI) database. Oracle EM doesn’t support 26AI as a repository database. Oracle EM 24AI only supports 19.22 as a repository database as of today. So the question is, how we can me this work?
We will have to use an intermediate Oracle database for the AI Vector functionality. I decided to use the 26AI free version of the Oracle database. You can download a VirtualBox appliance image and run it in your personal computer. There are several limitations of the 26AI free version but it works for my use case.
Now that I have the 26AI appliance and database running, I need to bring the Oracle EM Incident data over. There are several ways to extract the data:
- Connect through SQL Plus to the repository database
 - Use REST APIs
 - Use EMCLI
 
I decided to use REST APIs as it gives me the data in JSON format. Then I can format the in any shape or form needed.
Oracle EM exposes several REST APIs so we can consume data on external systems. There’s a whole framework behind these REST APIs.
We are going to call below REST API from the host where the 26AI free database is running.
$ curl -X GET -u sysman:<password> -H "Content-Type:application/json"  https://EM_HOST:EM_CONSOLE_HTTPS_PORT/em/api/incidents --insecure
This produces a JSON output with the first set of all open incidents with the most recently created incidents listed first.
I can create a Shell script that executes this CURL command, obtains the data in JSON format and then executes jq in order to convert it into CSV format. It will look similar to this:
#!/bin/bash
# Fetch JSON data using curl
JSON_DATA=$(curl -X GET -u sysman:<password> -H "Content-Type:application/json"  https://EM_HOST:EM_CONSOLE_HTTPS_PORT/em/api/incidents --insecure)
# Define the jq filter to extract and format data into CSV
# @csv: Convert the array into a CSV-formatted string
JQ_FILTER='.[].results | [.id, .displayId, .message, .severity, .status, .isOpen, .timeCreated, .timeUpdated, .target_names] | @csv' # Example filter
# Extract data and format as CSV using jq
CSV_CONTENT=$(echo "$JSON_DATA" | jq -r "$JQ_FILTER")
# Define CSV header (adjust to match your selected fields)
CSV_HEADER="id,displayId,message,severity, status, isOpen, timeCreated, timeUpdated, target_names"
# Output to a CSV file
echo "$CSV_HEADER" > incidents.csv
echo "$CSV_CONTENT" >> incidents.csv
echo "CSV data saved to incidents.csv"
We need now to insert this data into the 26AI free database and create the Vectors. Before we do that we need to find a way to generate the vectors from our incident data. For that we will use a model provided by Oracle named all-MiniLM-L12-v2.
Let’s download and load the model into our 26AI free database.
$ mkdir -p /home/oracle/onnx
$ cd /home/oracle/onnx
$ wget https://adwc4pm.objectstorage.us-ashburn-1.oci.customer-oci.com/p/VBRD9P8ZFWkKvnfhrWxkpPe8K03-JIoM5h_8EJyJcpE80c108fuUjg7R5L5O7mMZ/n/adwc4pm/b/OML-Resources/o/all_MiniLM_L12_v2_augmented.zip
$ unzip -oq all_MiniLM_L12_v2_augmented.zip
$ sqlplus sys as sysdba
SQL> create user em_inc identified by password quota unlimited on users;
SQL> grant create session, db_developer_role, create mining model to em_inc;
SQL> create or replace directory model_onnx as '/home/oracle/onnx';
SQL> grant read, write on directory model_onnx to em_inc;
SQL> begin
  dbms_vector.drop_onnx_model (
    model_name => 'ALL_MINILM_L12_V2',
    force => true);
  dbms_vector.load_onnx_model (
    directory  => 'model_onnx',
    file_name  => 'all_MiniLM_L12_v2.onnx',
    model_name => 'ALL_MINILM_L12_V2');
end;
/
SQL> select model_name, algorithm, mining_function from   user_mining_models where  model_name = 'ALL_MINILM_L12_V2';
MODEL_NAME                     ALGORITHM  MINING_FUNCTION
------------------------------ ---------- ---------------
ALL_MINILM_L12_V2              ONNX       EMBEDDING
At this point I can generate the Vector embeddings for the incident data.
We will now import our incident data from the CSV file and create the vector.
SQL> conn em_inc/password
SQL> create table incidents as
select id, displayId, message, severity, status, isOpen, timeCreated, timeUpdated, target_names
from   external (
         (
           id  varchar2(40),
           displayId        number(6),
           message   varchar2(500),
           severity   varchar2(100),
           status  varchar(400),
           isOpen  varchar(50),
           timeCreated varchar(400),
           timeUpdated varchar(400),
           target_names varchar(1500)
         )
         type oracle_loader
         default directory model_dir
         access parameters (
           records delimited by newline
           skip 1
           badfile model_dir
           logfile model_dir:'incidents_ext_tab_%a_%p.log'
           discardfile model_dir
           fields csv with embedded terminated by ',' optionally enclosed by '"'
           missing field values are null
        )
        location ('incidents.csv')
        reject limit unlimited
      );
SQL> alter table incidents add (
  incidents_vector vector
);
SQL> update incidents
set incidents_vector = vector_embedding(all_minilm_l12_v2 using message as data);
SQL> commit;
Let’s now perform a Vector similarity search using the VECTOR_DISTANCE function. This first query will do a search on “Incidents related to agents” query Vector.
SQL> SELECT vector_distance(incidents_vector, (vector_embedding(all_minilm_l12_v2 using 'Incidents related to agents' as data)), EUCLIDEAN) as distance,
       message
FROM   incidents
order by distance
FETCH EXACT FIRST 10 ROWS ONLY;
| DISTANCE | MESSAGE | 
| 0.98592836033379427 | Process em_agent crashed unexpectedly. | 
| 0.98592836033379427 | Process em_agent crashed unexpectedly. | 
| 0.98592836033379427 | Process em_agent crashed unexpectedly. | 
| 1.08642161802020090 | Number of Active Agents crossed the critical threshold (1). Current value: 0 | 
| 1.09352931649992050 | Agent Unreachable (REASON = Unable to connect to the agent at https://host:3872/emd/main/ [Connection refused]). Host is reachable. | 
| 1.09352931649992050 | Agent Unreachable (REASON = Unable to connect to the agent at https://host:3872/emd/main/ [Connection refused]). Host is reachable. | 
| 1.09352931649992050 | Agent Unreachable (REASON = Unable to connect to the agent at https://host:3872/emd/main/ [Connection refused]). Host is reachable. | 
| 1.23613168789133000 | Process ora_pmon crashed unexpectedly. | 
| 1.23613168789133000 | Process ora_pmon crashed unexpectedly. | 
Let’s analyze these results. The shortest distance in the Similarity Vector search is for the message containing the same exact word “agent”, the second closest distance is for messages having the word “Agent” and the third one is for messages not having the word “a(A)gent”.
With this functionality we can perform Similarity Vector searches on our EM incident data.
Thanks,
Alfredo