UTL_MATCH-like function to work with CLOB

纵然是瞬间 提交于 2019-12-22 10:44:56

问题


My question is: Is there a UTL_MATCH-like function which works with a CLOB rather than a VARCHAR2?

My specific problem is: I'm on an Oracle database. I have a bunch of pre-written queries which interface with Domo CenterView. The queries have variables in them defined by ${variableName}. I need to rewrite these queries. I didn't write the original so instead of figuring out what a good value for the variables should be I want to run the queries with the application and get what the query was from V$SQL.

So my solution is: Do a UTL_MATCH on the queries with the variable stuff in it and V$SQL.SQL_FULLTEXT. However, UTL_MATCH is limited to VARCHAR2 and the datatype of V$SQL.SQL_FULLTEXT is CLOB. So, this is why I'm looking for a UTL_MATCH-like function which works with a CLOB datatype.

Any other tips of how to accomplish this are welcome. Thanks!

Edit, about the tips. If you have a better idea of how to do this, let me just tell you some information I've got at my disposal. I have about 100 queries, they're all in an excel spreadsheet (the ones with the ${variableName} in them). So I could pretty easily use excel to write a query for me. I'm hoping to just union all those queries together and copy the output to another sheet. Anyway, maybe that's helpful if you're thinking there's a better way to do this.

An example: Let's say I have the following query from Domo:

select department.dept_name
from department
where department.id = '${selectedDepartmentId}'
;

I want to call something like this:

select v.sql_fulltext
from v$sql v
where utl_match.jaro_winkler_similarity(v.sql_fulltext,
'select department.dept_name
from department
where department.id = ''${selectedDepartmentId}''') > 90
;

And get something like this in return:

SQL_FULLTEXT
------------------------------------------
select department.dept_name
from department
where department.id = '154'

What I've tried:

I tried substringing the clob and casting it to a varchar. I was really hopeful this would work, but it gives me an error. Here's the code:

select v.sql_fulltext
from v$sql v
where  utl_match.jaro_winkler_similarity( cast( substr (v.sql_fulltext, 0, 4000) as varchar2 (4000)),
'select department.dept_name
from department
where department.id = ''${selectedDepartmentId}''') > 90
;

And here's the error:

ORA-22835: Buffer too small for CLOB to CHAR or BLOB to RAW conversion (actual: 8000, maximum: 4000)

However, if I run this it works fine:

select cast(substr(v.sql_fulltext, 0, 4000) as varchar2 (4000))
from v$sql v
;

So I'm not sure what the problem is with casting the substring...


回答1:


UTL_MATCH is a packaging for comparing strings with regards for checking how similar two strings are. Its functions evaluate strings and return scores. So all you're going to get is a number indicating (say) how many edits you need to turn ${variableName} into "Farmville" or "StackOveflow".

What you won't get is the actual differences: these two strings of text are identical except at offset 123 where it replaces ${variableName} with "Farmville".

Putting it like that suggests an alternative approach. Using INSTR() and SUBSTR() to locate instances of ${variableName} in your Domo CenterView queries and use those offsets to identify the different text in the v$sql.fulltext equivalents. You can do this with CLOB in PL/SQL with the DBMS_LOB package.




回答2:


If the text you want to search has length <= 32767, then you can just convert the CLOB to VARCHAR2 using DBMS_LOB.SUBSTR:

select v.sql_fulltext 
from v$sql v 
where utl_match.jaro_winkler_similarity(dbms_lob.substr(v.sql_fulltext), 'select department.dept_name from department where department.id = ''${selectedDepartmentId}''') > 90 ;



回答3:


I ended up creating a custom function for it. Here's the code:

CREATE OR REPLACE function match_clob(clob_1 clob, clob_2 clob) return number as

similar number := 0;
sec_similar number := 0;
sections number := 0;
max_length number := 3949;
length_1 number;
length_2 number;
vchar_1 varchar2 (3950);
vchar_2 varchar2 (3950);

begin
  length_1 := length(clob_1);
  length_2 := length(clob_2);
  --dbms_output.put_line('length_1: '||length_1);
  --dbms_output.put_line('length_2: '||length_2);
  IF length_1 > max_length or length_2 > max_length THEN

    FOR x IN 1 .. ceil(length_1 / max_length) LOOP

      --dbms_output.put_line('((x-1)*max_length) + 1'||(x-1)||' * '||max_length||' = '||(((x-1)*max_length) + 1));

      vchar_1 := substr(clob_1, ((x-1)*max_length) + 1, max_length);
      vchar_2 := substr(clob_2, ((x-1)*max_length) + 1, max_length);

--      dbms_output.put_line('Section '||sections||' vchar_1: '||vchar_1||' ==> vchar_2: '||vchar_2);

      sec_similar := UTL_MATCH.JARO_WINKLER_SIMILARITY(vchar_1, vchar_2);

      --dbms_output.put_line('sec_similar: '||sec_similar);

      similar := similar + sec_similar;
      sections := sections + 1;

    END LOOP;

    --dbms_output.put_line('Similar: '||similar||' ==> Sections: '||sections);
    similar := similar / sections;

  ELSE
    similar := UTL_MATCH.JARO_WINKLER_SIMILARITY(clob_1,clob_2);
  END IF;
  --dbms_output.put_line('Overall Similar: '||similar);
   return(similar);
end;
/


来源:https://stackoverflow.com/questions/10703609/utl-match-like-function-to-work-with-clob

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!