XML parsing conundrum

后端 未结 7 1399
甜味超标
甜味超标 2021-01-15 22:25

UPDATE: I\'ve reworked the question, to show progress I\'ve made, and maybe make it easier to answer.

UPDATE 2: I\'ve added another value to the XML. Extension avail

相关标签:
7条回答
  • 2021-01-15 22:48

    This is the code that will give you the result you need. UPDATE: This concerns the latest grouping you asked for.

    $scrape_xml = "files.xml";
    $xml = simplexml_load_file($scrape_xml);
    $groups = array();
    
    foreach ($xml->Item as $file){
        $platform = stripslashes($file->Platform);
        $name = stripslashes($file->Name);
        $title = stripslashes($file->Title);
        $extensions = explode('    ', $file->Ext);
    
        foreach($extensions as $extension)
        {
            if (!isset($groups2[$platform])) $groups2[$platform] = array();
            if (!isset($groups2[$platform][$extension])) $groups2[$platform][$extension] = array();
    
            $groupFound = false;
            for($idx = 0; $idx < count($groups2[$platform][$extension]); $idx ++) {
                if ($groups2[$platform][$extension][$idx]["Name"] == $name 
                    && $groups2[$platform][$extension][$idx]["Title"] == $title) {
    
                    $groups2[$platform][$extension][$idx]["Files"][] =
                        array('DownloadPath' => $file->DownloadPath."");
    
                    $groupFound = true;
    
                    break;
                }
            }
    
            if ($groupFound) continue;
    
            $groups2[$platform][$extension][] = 
                array(
                    "Name" => $name,
                    "Title" => $title,
                    "Files" => array(array('DownloadPath' => $file->DownloadPath."")));
        }
    }
    
    echo "<br />";
    echo "<pre>";
    print_r($groups2);
    echo "</pre>";
    
    0 讨论(0)
  • 2021-01-15 22:49

    You haven't explained what you're seeing wrong, exactly, so I'm going to have to guess.

    First, in your source, your last DownloadPath is /this/windows/3/1.zip even though it's supposed to be a Mac file - mis-type, I'm sure, but the output will "look wrong" with that there.

    Next, if you want strings rather than SimpleXMLElement Objects, you need this (also done some tidying to avoid so many stripslashes() calls):

    foreach ($xml->Item as $file) {
        $platform = stripslashes((string) $file->Platform);
        $name = stripslashes((string) $file->Name);
        $title = stripslashes((string) $file->Title);
        if( !isset($groups[$platform][$name][$title])) {
            $groups[$platform][$name][$title] = array(
                'Platform' => $platform,
                'Name' => $name,
                'Title' => $title 
            );
        } 
        $groups[$platform][$name][$title]['Files'][] = (string) $file->DownloadPath;
    }
    

    Notice the (string) bits? They cast the object to a string, which allows you access to the literal value rather than the object. This is also the reason why your array keys worked, because they were internally cast to strings (only strings and integer may be used as array keys).

    I think that's all I can find that might answer your question. If it isn't please let me know more clearly what's wrong and I'll be happy to try and help.

    0 讨论(0)
  • 2021-01-15 22:53

    How's something like this? Code is a bit sloppy, and tweaks should probably be made to improve the validation.

    class XMLFileImporter {
      public $file; //Absolute path to import file
      public $import = array();
      public $xml;
      public $error = false;
    
      public function __construct($file) {
        $this->file = $file;
        $this->load();
      }
    
      public function load() {
        if(!is_readable($this->file)) {
          $this->error("File is not readable");
          return false;
        }
    
        $xml = simplexml_load_file($this->file);
        if(!$xml) {
          $this->error("XML could not be parsed");
          return false;
        }
        $this->xml = json_decode(json_encode($xml));
    
        return true;
      }
    
      public function import() {
        $count = $this->parseItems();
        echo "Imported $count rows";
    
      }
    
      public function parseItems() {
        if($this->error()){
          return false;
        }
    
        if(!self::validateXML($this->xml)) {
          $this->error("Invalid SimpleXML object");
          return false;
        }
    
        if(!self::validateArray($this->xml->Item)) {
          $this->error("Invalid Array 'Item' on SimpleXML object");
          return false;
        }
        $count = 0;
        foreach($this->xml->Item as $item) {
          if($this->parseItem($item)){
            $count++;
          }
        }
        return $count;
    
      }
      public function parseItem($item) {
        if($this->error()){
          return false;
        }
    
        if(!self::validateItem($item)) {
          $this->error("Invalid file item");
          return false;
        }
    
        $item = self::normalizeItem($item);
    
        $this->handlePlatform((string)$item->Platform);
        $this->handleGroup($item);
        $this->handleSubGroup($item);
        $this->handleFile($item);
        return true;
      }
    
      public function handlePlatform($platform) {
        if(!isset($this->import[$platform])) {
          $this->import[$platform] = array();
        }
    
        return true;
      }
    
      public function handleGroup($item) {
        if(!isset($this->import[$item->Platform][$item->Name])) {
          $this->import[$item->Platform][$item->Name] = array();
        }
        return true;
      }
    
      public function handleSubGroup($item) {
        if(!isset($this->import[$item->Platform][$item->Name][$item->Title])) {
          $this->import[$item->Platform][$item->Name][$item->Title] = array();
        }
        return true;
      }
    
      public function handleFile($item) {
        array_push($this->import[$item->Platform][$item->Name][$item->Title],$item->DownloadPath);
      }
    
      public function error($set=false) {
        if($set){
          $this->error = $set;
          return true;
        }
        return $this->error;
      }
    
      public static function validateXML($xml) {
        return is_object($xml);
      }
      public static function validateArray($arr,$min=1){
        return (isset($arr) && !empty($arr) && count($arr) > $min);
    
      }
    
      public static function validateItem($item){
        return (isset($item->Title)
               && isset($item->Name)
               && isset($item->DownloadPath)
               && isset($item->Platform));
    
      }
    
      public static function normalizeItem($item){
        $item->Name = stripslashes(trim((string)$item->Name));
        $item->Title = stripslashes(trim((string)$item->Title));
        $item->Platform = (string)$item->Platform;
        $item->DownloadPath = (string)$item->DownloadPath;
    
        return $item;
      }
    
      public function output() {
        print_r($this->import);
        return true;
      }
    
    }
    
    $importer = new XMLFileImporter(dirname(__FILE__)."/files.xml");
    $importer->load();
    $importer->import();
    $importer->output();
    var_dump($importer->error());
    
    0 讨论(0)
  • 2021-01-15 22:53

    You can try this:

    $scrape_xml = "files.xml";
    $xml = simplexml_load_file($scrape_xml);
    
    $group = array();
    
    foreach ($xml->Item as $file)
    {
        $platform = stripslashes($file->Platform);
        $name = stripslashes($file->Name);
        $title = stripslashes($file->Title);
        $downloadPath = stripslashes($file->DownloadPath);
    
        if(!isset($group[$platform]))
        {
            $group[$platform] = array();
            $group[$platform][] = array("Name" => $name,"Title" => $title, "Files" => array($downloadPath));
        }
        else
        {
            $found = false;
    
            for($i=0;$i<count($group[$platform]);$i++)
            {
                if($group[$platform][$i]["Name"] == $name  && $group[$platform][$i]["Title"] == $title)
                {
                    $group[$platform][$i]["Files"][] = $downloadPath;
                    $found = true;
                    break;
                }
            }
    
            if(!$found)
            {
                $group[$platform][] = array("Name" => $name,"Title" => $title, "Files" => array($downloadPath));
            }
        }
    }
    
    echo "<pre>".print_r($group,true)."</pre>";
    
    0 讨论(0)
  • 2021-01-15 22:55

    start by declaring

    $groups[stripslashes($file->Platform)][stripslashes($file->Name)]
      [stripslashes($file->Title)] = (object)array(
        'Name' => $file->Name,
        'Title' => $file->Title,
        'Files' = (object)array()
      );
    

    This will get you closer.

    You should also check the type of each XMLElement as you get it to see if its an array or a simple object. Then treat accordingly.

    0 讨论(0)
  • 2021-01-15 22:59

    You are merely mapping the input values into the output array by arranging them differently, this is your structure:

    Array(
      [... Item/Platform] => Array (
        [... Item/Title as 0-n] => array(
            "Name" => Item/Name,
            "Title" => Item/Title,
            "Files" => array(
                [...] => array(
                    "DownloadPath" => Item/DownloadPath
                ),
            )
        ),
    

    The mapping can be done by iterating over the items within the XML and storing the values into the appropriate place in the new array (I named it $build):

    $build = array();
    foreach($items as $item)
    {
        $platform = (string) $item->Platform;
        $title = (string) $item->Title;
        isset($build[$platform][$title]) ?: $build[$platform][$title] = array(
            'Name' => (string) $item->Name,
            'Title' => $title
        );
        $build[$platform][$title]['Files'][] = array('DownloadPath' => (string) $item->DownloadPath);
    }
    $build = array_map('array_values', $build);
    

    The array_map call is done at the end to convert the Item/Title keys into numerical ones.

    And that's it, here the Demo.

    Let me know if that's helpful.

    Edit: For your updated data, it's a slight modification of the above, the key principles of the previous example still exist, it's additionally taken care of the extra duplication per each additional extension per item, by adding another iteration inside:

    $build = array();
    foreach($items as $item)
    {
        $platform = (string) $item->Platform;
        $title = (string) $item->Title;
        foreach(preg_split("~\s+~", $item->Ext) as $ext)
        {
            isset($build[$platform][$ext][$title])
                ?:$build[$platform][$ext][$title] = array(
                    'Name' => (string) $item->Name,
                    'Title' => $title
                );
            $build[$platform][$ext][$title]['Files'][]
                = array('DownloadPath' => (string) $item->DownloadPath);
        }
    }
    $build = array_map(function($v) {return array_map('array_values', $v);}, $build);
    
    0 讨论(0)
提交回复
热议问题