Find out the real file type

后端 未结 7 772
遇见更好的自我
遇见更好的自我 2020-12-29 15:02

I am working on an ASP web page that handles file uploads. Only certain types of files are allowed to be uploaded, like .XLS, .XML, .CSV, .TXT, .PDF, .PPT, etc.

I

相关标签:
7条回答
  • 2020-12-29 15:21

    I know you said C#, but this could maybe be ported. Also, it has an XML file already containing many descriptors for common file types.

    It's a Java library called JMimeMagic. It's here: http://jmimemagic.sourceforge.net/

    0 讨论(0)
  • 2020-12-29 15:24

    One way would be to check for certain signatures or magic numbers in the files. This page has a handy list of known file signatures and seems quite up to date:

    http://www.garykessler.net/library/file_sigs.html

    0 讨论(0)
  • 2020-12-29 15:41

    In other words if a trojan.exe was renamed to harmless.pdf and uploaded, the application must be able to find out that the uploaded file is NOT a .PDF file.

    That's not really a problem. If a .exe was uploaded as a .pdf and you correctly served it back up to the downloader as application/pdf, all the downloader would get would be a broken PDF. They would have to manually retype it to .exe to get harmed.

    The real problems are:

    1. Some browsers may sniff the content of the file and decide they know better than you about what type of file it is. IE is particularly bad at this, tending to prefer to render the file as HTML if it sees any HTML tags lurking near the start of the file. This is particulary unhelpful as it means script can be injected onto your site, potentially compromising any application-level security (cookie stealing et al). Workarounds include always serving the file as an attachment using Content-Disposition, and/or serving files from a different hostname, so it can't cross-site-script back onto your main site.

    2. PDF files are not safe anyway! They can be full of scripting, and have had significant security holes. Exploitation of a hole in the PDF reader browser plugin is currently one of the most common means of installing trojans on the web. And there's almost nothing you can usually do to try to detect the exploits as they can be highly obfuscated.

    0 讨论(0)
  • 2020-12-29 15:42

    Get the file headers of the "safe" file types - executables always have their own types of headers, and you can probably detect those. You'd have to be familiar with every format that you intend to accept, however.

    0 讨论(0)
  • 2020-12-29 15:44

    On **NIX* systems we have an utility called file(1). Try to find something similar for Windows, but the file utility if self has been ported.

    0 讨论(0)
  • 2020-12-29 15:45

    The following C++ code could help you:

    //-1 : File Does not Exist or no access
    //0 : not an office document
    //1 : (General) MS office 2007
    //2 : (General) MS office older than 2007
    //3 : MS office 2003 PowerPoint presentation
    //4 : MS office 2003 Excel spreadsheet
    //5 : MS office applications or others 
    int IsOffice2007OrOlder(wchar_t * fileName)
    {
        int iRet = 0;
        byte msgFormatChk2007[8]    = {0x50, 0x4B, 0x03, 0x04, 0x14, 0x00, 0x06, 0x00};     //offset 0 for office 2007 documents
        byte possibleMSOldOffice[8] = {0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1};     //offset 0 for possible office 2003 documents
    
        byte msgFormatChkXLSPPT[4]  = {0xFD, 0xFF, 0xFF, 0xFF};     // offset 512: xls, ppt: FD FF FF FF 
        byte msgFormatChkOnlyPPT[4] = {0x00, 0x6E, 0x1E, 0xF0};     // offset 512: another ppt offset PPT   
        byte msgFormatChkOnlyDOC[4] = {0xEC, 0xA5, 0xC1, 0x00};     //offset 512: EC A5 C1 00 
        byte msgFormatChkOnlyXLS[8] = {0x09, 0x08, 0x10, 0x00, 0x00, 0x06, 0x05, 0x00};     //offset 512: XLS
    
        int iMsgChk = 0;
        HANDLE fileHandle = CreateFile(fileName, GENERIC_READ,
            FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_READONLY, NULL  );
        if(INVALID_HANDLE_VALUE == fileHandle) 
        { 
            return -1; 
        }
    
        byte buff[20];
        DWORD bytesRead;
        iMsgChk = 1;
        if(0 == ReadFile(fileHandle, buff, 8, &bytesRead, NULL )) 
        { 
            return -1; 
        }
    
        if(buff[0] == msgFormatChk2007[0]) 
        {
            while(buff[iMsgChk] == msgFormatChk2007[iMsgChk] && iMsgChk < 9)
                iMsgChk++;
    
            if(iMsgChk >= 8) {  
                iRet = 1; //office 2007 file format
            }
        } 
        else if(buff[0] == possibleMSOldOffice[0])
        {
            while(buff[iMsgChk] == possibleMSOldOffice[iMsgChk] && iMsgChk < 9)
                iMsgChk++;
    
            if(iMsgChk >= 8)
            {   
                //old office file format, check 512 offset further in order to filter out real office format
                iMsgChk = 1;
                SetFilePointer(fileHandle, 512, NULL, FILE_BEGIN);
                if(ReadFile(fileHandle, buff, 8, &bytesRead, NULL ) == 0) { return 0; }
    
                if(buff[0] == msgFormatChkXLSPPT[0])
                {
                    while(buff[iMsgChk] == msgFormatChkXLSPPT[iMsgChk] && iMsgChk < 5)
                        iMsgChk++;
    
                    if(iMsgChk == 4)
                        iRet = 2;
                }
                else if(buff[iMsgChk] == msgFormatChkOnlyDOC[iMsgChk])
                {
                    while(buff[iMsgChk] == msgFormatChkOnlyDOC[iMsgChk] && iMsgChk < 5)
                        iMsgChk++;
                    if(iMsgChk == 4)
                        iRet = 2;
    
                }
                else if(buff[0] == msgFormatChkOnlyPPT[0])
                {
                    while(buff[iMsgChk] == msgFormatChkOnlyPPT[iMsgChk] && iMsgChk < 5)
                        iMsgChk++;
    
                    if(iMsgChk == 4)
                        iRet = 3;
                }
                else if(buff[0] == msgFormatChkOnlyXLS[0])
                {
    
                    while(buff[iMsgChk] == msgFormatChkOnlyXLS[iMsgChk] && iMsgChk < 9)
                        iMsgChk++;
    
                    if(iMsgChk == 9)
                        iRet = 4;
                } 
    
                if(0 == iRet){
                    iRet = 5;
                }
            }
        }
    
    
        CloseHandle(fileHandle);
    
        return iRet;
    }
    
    0 讨论(0)
提交回复
热议问题