How can I tell what Database format a file (or set of files) was created with (in Delphi)?

前端 未结 3 639
温柔的废话
温柔的废话 2021-02-04 14:49

I have a number of data files created by many different programs. Is there a way to determine the database and version of the database that was used to create the data file.

相关标签:
3条回答
  • 2021-02-04 15:24

    First, I do not believe you could do more in a "quick scan" than provide a "possible format". Also, it's very difficult to imagine that any quick technique could be reliable.

    DBASE files commonly use the extension .dbf. There are variants of the dBase file format used by FoxPro, and Clipper. Wikipedia documents these as xBase. Any dBase library that can open dBase files will also probably be able to (a) show that this is in fact a true dBase file by opening it, and (b) allow you to see which supported variants of the xBase file format are in use.

    Access files are usually using the .mdb file format, but can be encrypted with a password. You could probably write your own library that could postiively identify the internal content as being of the "Jet database engine" (internal type of file used by Access) but not read the content, but I doubt that short of cracking the password, you could do this reliably.

    FileMaker files can have many file extensions, and their internal file formats are not well documented. According to wikipedia, .fm .fp3 .fp5 and .fp7 are common file extensions. You will have similar "password" problems with filemaker databases, as with Access. I am not aware of any way to read filemaker files in delphi except through ODBC, and even then, I don't think you could provide an "omni-reader" in Delphi that was powered by ODBC, since ODBC requires careful setup and knowledge of the originating file into an odbc data source before it becomes readable through ODBC. Browse/Discovery is not a phase that is supported by ODBC.

    SQLite files can have any file extension at all. The easiest way to try to detect it would be to load/open the file using SQLite and see if it opens.

    The rest of the list is more or less infinite, and the technique would be the same. Just keep rolling more database engines and access layer libraries into your Katamari Damaci Database Detector Tool.

    If you want to start with old database formats as you seem to be, I would investigate using BDE (ancient, but hey, you're talking about ancient stuff), plus ADO, to try to auto-detect and open files.

    0 讨论(0)
  • First of all, check the file extension. Take a look at the corresponding wikipedia article, or other sites.

    Then you can guess the file format from its so called "signature".

    This is mostly the first characters content, which is able to identify the file format.

    You've an updated list at this very nice Gary Kessler's website.

    For instance, here is how our framework identify the MIME format from the file content, on the server side:

    function GetMimeContentType(Content: Pointer; Len: integer;
      const FileName: TFileName=''): RawUTF8;
    begin // see http://www.garykessler.net/library/file_sigs.html for magic numbers
      result := '';
      if (Content<>nil) and (Len>4) then
        case PCardinal(Content)^ of
        $04034B50: Result := 'application/zip'; // 50 4B 03 04
        $46445025: Result := 'application/pdf'; //  25 50 44 46 2D 31 2E
        $21726152: Result := 'application/x-rar-compressed'; // 52 61 72 21 1A 07 00
        $AFBC7A37: Result := 'application/x-7z-compressed';  // 37 7A BC AF 27 1C
        $75B22630: Result := 'audio/x-ms-wma'; // 30 26 B2 75 8E 66
        $9AC6CDD7: Result := 'video/x-ms-wmv'; // D7 CD C6 9A 00 00
        $474E5089: Result := 'image/png'; // 89 50 4E 47 0D 0A 1A 0A
        $38464947: Result := 'image/gif'; // 47 49 46 38
        $002A4949, $2A004D4D, $2B004D4D:
          Result := 'image/tiff'; // 49 49 2A 00 or 4D 4D 00 2A or 4D 4D 00 2B
        $E011CFD0: // Microsoft Office applications D0 CF 11 E0 = DOCFILE
          if Len>600 then
          case PWordArray(Content)^[256] of // at offset 512
            $A5EC: Result := 'application/msword'; // EC A5 C1 00
            $FFFD: // FD FF FF
              case PByteArray(Content)^[516] of
                $0E,$1C,$43: Result := 'application/vnd.ms-powerpoint';
                $10,$1F,$20,$22,$23,$28,$29: Result := 'application/vnd.ms-excel';
              end;
          end;
        else
          case PCardinal(Content)^ and $00ffffff of
            $685A42: Result := 'application/bzip2'; // 42 5A 68
            $088B1F: Result := 'application/gzip'; // 1F 8B 08
            $492049: Result := 'image/tiff'; // 49 20 49
            $FFD8FF: Result := 'image/jpeg'; // FF D8 FF DB/E0/E1/E2/E3/E8
            else
              case PWord(Content)^ of
                $4D42: Result := 'image/bmp'; // 42 4D
              end;
          end;
        end;
      if (Result='') and (FileName<>'') then begin
        case GetFileNameExtIndex(FileName,'png,gif,tiff,tif,jpg,jpeg,bmp,doc,docx') of
          0:   Result := 'image/png';
          1:   Result := 'image/gif';
          2,3: Result := 'image/tiff';
          4,5: Result := 'image/jpeg';
          6:   Result := 'image/bmp';
          7,8: Result := 'application/msword';
          else begin
            Result := RawUTF8(ExtractFileExt(FileName));
            if Result<>'' then begin
              Result[1] := '/';
              Result := 'application'+LowerCase(Result);
            end;
          end;
        end;
      end;
      if Result='' then
        Result := 'application/octet-stream';
    end;
    

    You can use a similar function, from the GAry Kessler's list.

    0 讨论(0)
  • 2021-02-04 15:26

    There are lots of database engines with hundreds (if not thousands) of versions and formats. (Binary, CSV, XML...) Many of them are encrypted to protect the content. It is quite "impossible" to identify every database and every format and it is a subject of constant changes.

    So first of all you have to limit your task to a list of database engines you want to scan. Thats what i would do...

    0 讨论(0)
提交回复
热议问题