I have a data with chinese characters as field names and data, I have imported them from xls to access 2007 and export them to ODBC. Then I use RODBC to read them in R, the
I'm not familiar with ODBC and RODBC
, but my reading of the above snippet of documentation is that SET NAMES 'utf8';
is part of MySQL's SQL dialect, so you run that as you would any other SQL statement that you might use to retrieve data from your data base.
Something like (not tested):
sqlQuery(myChannel, query = "SET NAMES 'utf8';")
where myChannel
is the connection handle returned by odbcConnect()
.
Is there a reason you are using RODBC over the RMySQL package? I've had good experience using RMySQL for extensive data processing and retrieval of complex sets of data all from within R.
Update:
There is some evidence that, at least at one point, that SET NAMES
has been deactivated in the MySQL ODBC driver. If you are confident you can read the characters via direct access to the database (via mysql
or one of MySQL's GUI front ends), then you could try to replicate what SET NAMES
does. The following is from the MySQL manual:
A SET NAMES 'x' statement is equivalent to these three statements:
SET character_set_client = x;
SET character_set_results = x;
SET character_set_connection = x;
You could try executing those three SQL statements in place of SET NAMES
and see if that works.
The same manual also documents SET CHARACTER SET
, which can be used in the same way as SET NAMES
:
SET CHARACTER SET charset_name
SET CHARACTER SET
is similar to SET NAMES
but sets character_set_connection
and collation_connection
to character_set_database
and collation_database
. A SET CHARACTER SET x
statement is equivalent to these three statements:
SET character_set_client = x;
SET character_set_results = x;
SET collation_connection = @@collation_database;
Setting collation_connection
also sets character_set_connection
to the character set associated with the collation (equivalent to executing SET character_set_connection = @@character_set_database
). It is not necessary to set character_set_connection
explicitly.
You could try using SET CHARACTER SET 'utf8'
instead.
Finally, what character set / locale are you running in? It looks like you are on windows - is this a UTF8 locale? I also note some confusion in your Q. You say you have imported your data to MS Access, and then export it to ODBC. Do you mean you exported it to MySQL? I though ODBC was a connection driver to allow communication with/between a range of databases, not something you could "export to".
Are you data really in MySQL? Could you not connect to MS Access via RODBC to read the data from there?
If the data are in MySQL, try using the RMySQL package to connect to the database and read the data.
I just found the cure. Don't know if I can post.
Set up the MySQL database to be UTF-8 based;
Set up the ODBC DSN and do NOT set the "character set" option.
ch<-odbcConnect("mydb",DBMSencoding="UTF-8");
That's it.