get column names Jet OLE DB in vb.net

前端 未结 1 628
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-23 06:02

I\'ve written a function which reads csv files and parametrizes them accordingly, therefore i have a function gettypessql which queries sql table at first to get data types and

1条回答
  •  鱼传尺愫
    2021-01-23 06:41

    I think you have made it more complicated than it needs to be. For instance, you get the dbSchema by creating an empty DataTable and harvesting the Datatypes from it. Why not just use that first table rather than creating a new table from the Types? The table also need not be reconstructed over and over for each batch of rows imported.

    Generally since OleDb will try to infer types from the data, it seems unnecessary and may even get in the way in some cases. Also, you are redoing everything that OleDB does and copying data to a different DT. Given that, I'd skip the overhead OleDB imposes and work with the raw data.

    This creates the destination table using the CSV column name and the Type from the Database. If the CSV is not in the same column order as those delivered in a SELECT * query, it will fail.


    The following uses a class to map csv columns to db table columns so the code is not depending on the CSVs being in the same order (since they may be generated externally). My sample data CSV is not in the same order:

    Public Class CSVMapItem
    
        Public Property CSVIndex As Int32
        Public Property ColName As String = ""
       'optional
        Public Property DataType As Type
    
        Public Sub New(ndx As Int32, csvName As String,
                       dtCols As DataColumnCollection)
    
            CSVIndex = ndx
    
            For Each dc As DataColumn In dtCols
                If String.Compare(dc.ColumnName, csvName, True) = 0 Then
                    ColName = dc.ColumnName
                    DataType = dc.DataType
                    Exit For
                End If
            Next
    
            If String.IsNullOrEmpty(ColName) Then
                Throw New ArgumentException("Cannot find column: " & csvName)
            End If
        End Sub
    End Class
    

    The code to parse the csv uses CSVHelper but in this case the TextFieldParser could be used since the code just reads the CSV rows into a string array.

    Dim SQL = String.Format("SELECT * FROM {0} WHERE ID<0", DBTblName)
    Dim rowCount As Int32 = 0
    Dim totalRows As Int32 = 0
    Dim sw As New Stopwatch
    sw.Start()
    
    Using dbcon As New MySqlConnection(MySQLConnStr)
        Using cmd As New MySqlCommand(SQL, dbcon)
    
            dtSample = New DataTable
            dbcon.Open()
    
            ' load empty DT, create the insert command
            daSample = New MySqlDataAdapter(cmd)
            Dim cb = New MySqlCommandBuilder(daSample)
            daSample.InsertCommand = cb.GetInsertCommand
            dtSample.Load(cmd.ExecuteReader())
    
            ' dtSample is not only empty, but has the columns
            ' we need
    
            Dim csvMap As New List(Of CSVMapItem)
    
            Using sr As New StreamReader(csvfile, False),
                            parser = New CsvParser(sr)
    
                ' col names from CSV
                Dim csvNames = parser.Read()
                ' create a map of CSV index to DT Columnname  SEE NOTE
                For n As Int32 = 0 To csvNames.Length - 1
                    csvMap.Add(New CSVMapItem(n, csvNames(n), dtSample.Columns))
                Next
    
                ' line data read as string
                Dim data As String()
                data = parser.Read()
                Dim dr As DataRow
    
                Do Until data Is Nothing OrElse data.Length = 0
    
                    dr = dtSample.NewRow()
    
                    For Each item In csvMap
                        ' optional/as needed type conversion
                        If item.DataType = GetType(Boolean) Then
                            ' "1" wont convert to bool, but (int)1 will
                            dr(item.ColName) = Convert.ToInt32(data(item.CSVIndex).Trim)
                        Else
                            dr(item.ColName) = data(item.CSVIndex).Trim
                        End If
                    Next
                    dtSample.Rows.Add(dr)
                    rowCount += 1
    
                    data = parser.Read()
    
                    If rowCount = 50000 OrElse (data Is Nothing OrElse data.Length = 0) Then
                        totalRows += daSample.Update(dtSample)
                        ' empty the table if there will be more than 100k rows
                        dtSample.Rows.Clear()
                        rowCount = 0
                    End If
                Loop
            End Using
    
        End Using
    End Using
    sw.Stop()
    Console.WriteLine("Parsed and imported {0} rows in {1}", totalRows,
                        sw.Elapsed.TotalMinutes)
    

    The processing loop updates the DB every 50K rows in case there are many many rows. It also does it in one pass rather than reading N rows thru OleDB at a time. CsvParser will read one row at a time, so there should never be more than 50,001 rows worth of data on hand at a time.

    There may be special cases to handle for type conversions as shown with If item.DataType = GetType(Boolean) Then. A Boolean column read in as "1" cant be directly passed to a Boolean column, so it is converted to integer which can. There could be other conversions such as for funky dates.

    Time to process 250,001 rows: 3.7 mins. An app which needs to apply those string transforms to every single string column will take much longer. I'm pretty sure that using the CsvReader in CSVHelper you could have those applied as part of parsing to a Type.


    There is a potential disaster waiting to happen since this is meant to be an all-purpose importer/scrubber.

    For i As Integer = 0 To dt.Columns.Count - 1
        Dim columnName As String = firstRow(i).ToString()
        Dim newColumn As New DataColumn(columnName, mListOfTypes(i))
        getData.Columns.Add(newColumn)
    Next
    

    Both the question and the self-answer build the new table using the column names from the CSV and the DataTypes from a SELECT * query on the destination table. So, it assumes the CSV Columns are in the same order that SELECT * will return them, and that all CSVs will always use the same names as the tables.

    The answer above is marginally better in that it finds and matches based on name.

    A more robust solution is to write a little utility app where a user maps a DB column name to a CSV index. Save the results to a List(Of CSVMapItem) and serialize it. There could be a whole collection of these saved to disk. Then, rather than creating a map based on dead reckoning, just deserialize the desired for user as the csvMap in the above code.

    0 讨论(0)
提交回复
热议问题