Hi I am trying to import a large XML file into a table on my sql server (2014)
I have used the code below for smaller files and thought it would be ok as this is a o
Try this. Just another method that I have used for some time. It's pretty fast (could be faster). I pull a huge xml db from a gaming company every night. This is how i get it an import it.
$xml = new XMLReader();
$xml->open($xml_file); // file is your xml file you want to parse
while($xml->read() && $xml->name != 'game') { ; } // get past the header to your first record (game in my case)
while($xml->name == 'game') { // now while we are in this record
$element = new SimpleXMLElement($xml->readOuterXML());
$gameRec = $this->createGameRecord($element, $os); // this is my function to reduce some clutter - and I use it elsewhere too
/* this looks confusing, but it is not. There are over 20 fields, and instead of typing them all out, I just made a string. */
$sql = "INSERT INTO $table (";
foreach($gameRec as $field=>$game){
$sql .= " $field,";
}
$sql = rtrim($sql, ",");
$sql .=") values (";
foreach($gameRec as $field=>$game) {
$sql .= " :$field,";
}
$sql = rtrim($sql,",");
$sql .= ") ON DUPLICATE KEY UPDATE "; // online game doesn't have a gamerank - not my choice LOL, so I adjust that for here
switch ($os) {
case 'pc' : $sql .= "gamerank = ".$gameRec['gamerank'] ; break;
case 'mac': $sql .= "gamerank = ".$gameRec['gamerank'] ; break;
case 'pl' : $sql .= "playercount = ".$gameRec['playercount'] ; break;
case 'og' :
$playercount = $this->getPlayerCount($gameRec['gameid']);
$sql .= "playercount = ".$playercount['playercount'] ;
break;
}
try {
$stmt = $this->connect()->prepare($sql);
$stmt->execute($gameRec);
} catch (PDOException $e) {// Kludge
echo 'os: '.$os.'<br/>table: '.$table.'<br/>XML LINK: '.$comprehensive_xml.'<br/>Current Record:<br/><pre>'.print_r($gameRec).'</pre><br/>'.
'SQL: '.$sql.'<br/>';
die('Line:33<br/>Function: pullBFG()<BR/>Cannot add game record <br/>'.$e->getMessage());
}
/// VERY VERY VERY IMPORTANT do not forget these 2 lines, or it will go into a endless loop - I know, I've done it. locks up your system after a bit hahaah
$xml->next('game');
unset($element);
}// while there are games
This should get you started. Obviously, adjust the "game" to your xml records. Trim out the fat I have here.
Here is the createGameRecord($element, $type='pc') Basically it turns it into an array to use elsewhere, and makes it easier to add it to the db. with a single line as seen above: $stmt->execute($gameRec); Where $gameRec was returned from this function. PDO knows gameRec is an array, and will parse it out as you INSERT IT. the "delHardReturns() is another of my fucntion that gets rid of those hard returns /r /n etc.. Seems to mess up the SQL. I think SQL has a function for that, but I have not pursed it. Hope you find this useful.
private function createGameRecord($element, $type='pc') {
if( ($type == 'pc') || ($type == 'og') ) { // player count is handled separately
$game = array(
'gamename' => strval($element->gamename),
'gameid' => strval($element->gameid),
'genreid' => strval($element->genreid),
'allgenreid' => strval($element->allgenreid),
'shortdesc' => $this->delHardReturns(strval($element->shortdesc)),
'meddesc' => $this->delHardReturns(strval($element->meddesc)),
'bullet1' => $this->delHardReturns(strval($element->bullet1)),
'bullet2' => $this->delHardReturns(strval($element->bullet2)),
'bullet3' => $this->delHardReturns(strval($element->bullet3)),
'bullet4' => $this->delHardReturns(strval($element->bullet4)),
'bullet5' => $this->delHardReturns(strval($element->bullet5)),
'longdesc' => $this->delHardReturns(strval($element->longdesc)),
'foldername' => strval($element->foldername),
'hasdownload' => strval($element->hasdownload),
'hasdwfeature' => strval($element->hasdwfeature),
'releasedate' => strval($element->releasedate)
);
if($type === 'pc') {
$game['hasvideo'] = strval($element->hasvideo);
$game['hasflash'] = strval($element->hasflash);
$game['price'] = strval($element->price);
$game['gamerank'] = strval($element->gamerank);
$game['gamesize'] = strval($element->gamesize);
$game['macgameid'] = strval($element->macgameid);
$game['family'] = strval($element->family);
$game['familyid'] = strval($element->familyid);
$game['productid'] = strval($element->productid);
$game['pc_sysreqos'] = strval($element->systemreq->pc->sysreqos);
$game['pc_sysreqmhz'] = strval($element->systemreq->pc->sysreqmhz);
$game['pc_sysreqmem'] = strval($element->systemreq->pc->sysreqmem);
$game['pc_sysreqhd'] = strval($element->systemreq->pc->sysreqhd);
if(empty($game['gamerank'])) $game['gamerank'] = 99999;
$game['gamesize'] = $this->readableBytes((int)$game['gamesize']);
}// dealing with PC type
if($type === 'og') {
$game['onlineiframeheight'] = strval($element->onlineiframeheight);
$game['onlineiframewidth'] = strval($element->onlineiframewidth);
}
$game['releasedate'] = substr($game['releasedate'],0,10);
} else {// not type = pl
$game['playercount'] = strval($element->playercount);
$game['gameid'] = strval($element->gameid);
}// no type = pl else
return $game;
}/
Updated: Much faster. I did some research, and while the above post I made shows one (slow) method, I was able to find one that works even faster - for me it does. I put this as a new answer due to the complete difference from my previous post.
LOAD XML LOCAL INFILE 'path/to/file.xml' INTO TABLE tablename ROWS IDENTIFIED BY '<xml-identifier>'
Example
<students>
<student>
<name>john doe</name>
<boringfields>bla bla bla......</boringfields>
</student>
</students>
Then, MYSQL command would be:
LOAD XML LOCAL INFILE 'path/to/students.xml' INTO TABLE tablename ROWS IDENTIFIED BY '<student>'
rows identified must have single quote and angle brackets. when I switched to this method, I went from 12min +/- to 30 seconds!! +/-
tips that worked for me. was use the DELETE FROM tablename otherwise it will just append to your db.
Ref: https://dev.mysql.com/doc/refman/5.5/en/load-xml.html
The max size of an XML column value in SQL Server is 2GB. It will not be possible to import a 2.5GB file into a single XML column.
UPDATE
Since your underlying objective is to transform XML elements within the file into table rows, you don't need to stage the entire file contents into a single XML column. You can avoid the 2GB limitation, reduce memory requirements, and improve performance by shredding the XML in client code and using a bulk insert technique to insert batches of multiple rows.
The example Powershell script below uses an XmlTextReader to avoid reading the entire XML into a DOM and uses SqlBulkCopy to insert batches of many rows at once. The combination of these techniques should allow you to insert millions rows in minutes rather than hours. These same techniques can be implemented in a custom app or SSIS script task.
I noticed a couple of the table columns specify varchar(1)
yet the XML attribute values contain many characters. You'll need to either expand length of the columns or transform the source values.
[String]$global:connectionString = "Data Source=YourServer;Initial Catalog=YourDatabase;Integrated Security=SSPI";
[System.Data.DataTable]$global:dt = New-Object System.Data.DataTable;
[System.Xml.XmlTextReader]$global:xmlReader = New-Object System.Xml.XmlTextReader("C:\FilesToImport\files.xml");
[Int32]$global:batchSize = 10000;
Function Add-FileRow() {
$newRow = $dt.NewRow();
$null = $dt.Rows.Add($newRow);
$newRow["Product_ID"] = $global:xmlReader.GetAttribute("Product_ID");
$newRow["path"] = $global:xmlReader.GetAttribute("path");
$newRow["Updated"] = $global:xmlReader.GetAttribute("Updated");
$newRow["Quality"] = $global:xmlReader.GetAttribute("Quality");
$newRow["Supplier_id"] = $global:xmlReader.GetAttribute("Supplier_id");
$newRow["Prod_ID"] = $global:xmlReader.GetAttribute("Prod_ID");
$newRow["Catid"] = $global:xmlReader.GetAttribute("Catid");
$newRow["On_Market"] = $global:xmlReader.GetAttribute("On_Market");
$newRow["Model_Name"] = $global:xmlReader.GetAttribute("Model_Name");
$newRow["Product_View"] = $global:xmlReader.GetAttribute("Product_View");
$newRow["HighPic"] = $global:xmlReader.GetAttribute("HighPic");
$newRow["HighPicSize"] = $global:xmlReader.GetAttribute("HighPicSize");
$newRow["HighPicWidth"] = $global:xmlReader.GetAttribute("HighPicWidth");
$newRow["HighPicHeight"] = $global:xmlReader.GetAttribute("HighPicHeight");
$newRow["Date_Added"] = $global:xmlReader.GetAttribute("Date_Added");
}
try
{
# init data table schema
$da = New-Object System.Data.SqlClient.SqlDataAdapter("SELECT * FROM dbo.files_index WHERE 0 = 1;", $global:connectionString);
$null = $da.Fill($global:dt);
$bcp = New-Object System.Data.SqlClient.SqlBulkCopy($global:connectionString);
$bcp.DestinationTableName = "dbo.files_index";
$recordCount = 0;
while($xmlReader.Read() -eq $true)
{
if(($xmlReader.NodeType -eq [System.Xml.XmlNodeType]::Element) -and ($xmlReader.Name -eq "file"))
{
Add-FileRow -xmlReader $xmlReader;
$recordCount += 1;
if(($recordCount % $global:batchSize) -eq 0)
{
$bcp.WriteToServer($dt);
$dt.Rows.Clear();
Write-Host "$recordCount file elements processed so far";
}
}
}
if($dt.Rows.Count -gt 0)
{
$bcp.WriteToServer($dt);
}
$bcp.Close();
$xmlReader.Close();
Write-Host "$recordCount file elements imported";
}
catch
{
throw;
}