So I have a Redshift database with enough tables that it feels worth my time to build a front end to make querying it a little bit easier than just typing in SQL commands.
You might want to consider http://www.looker.com/. It's a frontend for exploring your DB, allowing easily savable queries and a GUI that the business guys can also use.
This tutorial helps you setup a rails app with a redshift adapter:
https://www.credible.com/code/setting-up-a-data-warehouse-with-aws-redshift-and-ruby/
In a nutshell:
Clone the sample app:
git clone git@github.com:tuesy/redshift-ruby-tutorial.git
cd redshift-ruby-tutorial
Setup ENV Variables via ~/.bashrc
(or dotenv):
export REDSHIFT_HOST=redshift-ruby-tutorial.ccmj2nxbsay7.us-east-1.redshift.amazonaws.com
export REDSHIFT_PORT=5439
export REDSHIFT_USER=deploy
export REDSHIFT_PASSWORD=<your password here>
export REDSHIFT_DATABASE=analytics
export REDSHIFT_BUCKET=redshift-ruby-tutorial
Use the gem activerecord4-redshift-adapter, in Gemfile:
'activerecord4-redshift-adapter', '~> 0.2.0' # For Rails 4.2
'activerecord4-redshift-adapter', '~> 0.1.1' # For Rails 4.1
Then you can query into redshift like you would with a normal AR model:
bundle exec rails c
RedshiftUser.count
(Disclosure: I havn't yet given this method a try, but I may soon)
#app/models/data_warehouse.rb
class DataWarehouse < ActiveRecord::Base
establish_connection "redshift_staging"
#or, if you want to have a db per environment
#establish_connection "redshift_#{Rails.env}"
end
Note that we are connecting on 5439, not the default 5432 so I specify the port Also, I specify a schema, beta, which is what we use for our unstable aggregates, you could either have a different db per environment as mentioned above, or use various schemas and include them in the search path for ActiveRecord
#config/database.yml
redshift_staging:
adapter: postgresql
encoding: utf8
database: db03
port: 5439
pool: 5
schema_search_path: 'beta'
username: admin
password: supersecretpassword
host: db03.myremotehost.us #your remote host here, might be an aws url from Redshift admin console
###OPTION 2, a direct PG Connection
class DataWarehouse < ActiveRecord::Base
attr_accessor :conn
def initialize
@conn = PG.connect(
database: 'db03',
port: 5439,
pool: 5,
schema_search_path: 'beta',
username: 'admin',
password: 'supersecretpassword',
host: 'db03.myremotehost.us'
)
end
end
[DEV] main:0> redshift = DataWarehouse
E, [2014-07-17T11:09:17.758957 #44535] ERROR -- : PG::InsufficientPrivilege: ERROR: permission denied to set parameter "client_min_messages" to "notice" : SET client_min_messages TO 'notice'
(pry) output error: #<ActiveRecord::StatementInvalid: PG::InsufficientPrivilege: ERROR: permission denied to set parameter "client_min_messages" to "notice" : SET client_min_messages TO 'notice'>
UPDATE:
https://github.com/fiksu/activerecord-redshift-adapter
Reason 1: ActiveRecord postgresql adapter sets client_min_messages Reason 2: adapter also attempts to set Time Zone, which redshift doesn't allow (http://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html) Reason 3: Even if you change the code in ActiveRecord for the first two errors, you run into additional errors that complain that Redshift is using Postgresql 8.0, at that point I moved on to the adapter, will revisit and update if I find something better later.
#Gemfile:
gem 'activerecord4-redshift-adapter', github: 'aamine/activerecord4-redshift-adapter'
#config/database.yml
redshift_staging:
adapter: redshift
encoding: utf8
database: db03
port: 5439
pool: 5
username: admin
password: supersecretpassword
host: db03.myremotehost.us
timeout: 5000
#app/models/base_aggregates_redshift_test.rb
#Model named to match my tables in Redshift, if you want you can set_table like I have commented out below
class BaseAggregatesRedshiftTest < ActiveRecord::Base
establish_connection "redshift_staging"
self.table_name = "beta.base_aggregates_v2"
end
[DEV] main:0> redshift = BaseAggregatesRedshiftTest.first
D, [2014-07-17T15:31:58.678103 #43776] DEBUG -- : BaseAggregatesRedshiftTest Load (45.6ms) SELECT "beta"."base_aggregates_v2".* FROM "beta"."base_aggregates_v2" LIMIT 1
#app/models/base_aggregates_redshift_test.rb
class BaseAggregatesRedshiftTest < ActiveRecord::Base
set_table "beta.base_aggregates_v2"
ActiveRecord::Base.establish_connection(
adapter: 'redshift',
encoding: 'utf8',
database: 'staging',
port: '5439',
pool: '5',
username: 'admin',
password: 'supersecretpassword',
search_schema: 'beta',
host: 'db03.myremotehost.us',
timeout: '5000'
)
end
#in console, abbreviated example of first record, now it's using the new name for my redshift table, just assuming I've got the record at base_aggregates_redshift_tests because I didn't set the table_name
[DEV] main:0> redshift = BaseAggregatesRedshiftTest.first
D, [2014-07-17T15:09:39.388918 #11537] DEBUG -- : BaseAggregatesRedshiftTest Load (45.3ms) SELECT "base_aggregates_redshift_tests".* FROM "base_aggregates_redshift_tests" LIMIT 1
#<BaseAggregatesRedshiftTest:0x007fd8c4a12580> {
:truncated_month => Thu, 31 Jan 2013 19:00:00 EST -05:00,
:dma => "Cityville",
:group_id => 9712338,
:dma_id => 9999
}
Good luck @johncorser!