Cassandra Schema Design

问题

I'm continuing exploring Cassandra and I would like to create Student <=> Course relation which is similar to Many-to-Many on RDBMS.

In term of Queries I will use the following query;

Retrieve all courses in which student enrolled.
Retrieve all students enrolled in specific course.

Let's say that I create to Column Families. one for Course and another for Student.

CREATE COLUMN FAMILY student with comparator = UTF8Type AND key_validation_class=UTF8Type and column_metadata=[ 
{column_name:firstname,validation_class:UTF8Type} 
{column_name:lastname,validation_class:UTF8Type}
{column_name:gender,validation_class:UTF8Type}];


CREATE COLUMN FAMILY course with comparator = UTF8Type AND key_validation_class=UTF8Type and column_metadata=[ 
{column_name:name,validation_class:UTF8Type} 
{column_name:description,validation_class:UTF8Type}
{column_name:lecturer,validation_class:UTF8Type}
{column_name:assistant,validation_class:UTF8Type}];

Now how should I move on?

Should I create third Column Family with courseID:studentId CompisiteKey? if yes, Can I use Hector to query by only one (left or right) Composite key component?

Please help.

Update:

Following the suggestion I created the following Schema:

For Student:

CREATE COLUMN FAMILY student with comparator = UTF8Type and key_validation_class=UTF8Type and default_validation_class=UTF8Type;

and then we will add some data:

set student['student.1']['firstName']='Danny'
set student['student.1']['lastName']='Lesnik'
set student['student.1']['course.1']=''
set student['student.1']['course.2']='';

Create column Family for Course:

CREATE COLUMN FAMILY course with comparator = UTF8Type and key_validation_class=UTF8Type and default_validation_class=UTF8Type;

add some data:

set course['course.1']['name'] ='History'
set course['course.1']['description'] ='History Course'
set course['course.1']['name'] ='Algebra'
set course['course.1']['description'] ='Algebra Course'

and Finally Student In Course:

CREATE COLUMN FAMILY StudentInCourse with comparator = UTF8Type and key_validation_class=UTF8Type and default_validation_class=UTF8Type;

add data:

set StudentInCourse['studentIncourse.1']['student.1'] =''; 
set StudentInCourse['studentIncourse.2']['student.1'] ='';

回答1:

I defined a data model below but it is easier to decribe the object model first and then dive into the row model, so from PlayOrm's perspective you would have

public class Student {
  @NoSqlId
  private String id;
  private String firstName;
  private String lastName;
  @ManyToMany
  private List<Course> courses = new ArrayList(); //constructing avoids nullpointers
}

public class Course {
  @NoSqlId
  private String id;
  private String name;
  private String description
  @ManyToOne
  private Lecturer lecturer;
  @ManyToMany
  private CursorToMany students = new CursorToManyImpl();
}

I could have used List in course but I was concerned I may get OutOfMemory if too many students take a course over years and years and years. NOW, let's jump to what PlayOrm does and you can do something similar if you like

A single student row would look like so

rowKey(the id in above entity) = firstName='dean',
lastName='hiller' courses.rowkey56=null, courses.78=null, courses.98=null, courses.101=null

This is the wide row where we have many columns with the name 'fieldname' and 'rowkey to actual course'

The Course row is a bit more interesting....because the user thinks loading al the Students for a single course could cause out of memory, he uses a cursor which only loads 500 at a time as you loop over it.

There are two rows backing the Course in this case that PlayOrm will have. Sooo, let's take our user row above and he was in course rowkey56 so let's describe that course

rowkey56 = name='coursename', description='somedesc', lecturer='rowkey89ToLecturer'

Then, there is another row in the some index table for the students(it is a very wide row so supports up to millions of students)

indexrowForrowkey56InCourse = student34.56, student39.56, student.23.56.... 
into the millions of students

If you want a course to have more than millions of students though, then you need to think about partitioning whether you use playOrm or not. PlayOrm does partitioning for you if you need though.

NOTE: If you don't know hibernate or JPA, when you load the above Student, it loads a proxy list so if you start looping over the courses, it then goes back to the noSQL store and loads the Courses so you don't have to ;).

In the case of Course, it loads a proxy Lecturer that is not filled in until you access a property field like lecturer.getName(). If you call lecturer.getId(), it doesn't need to load the lecturer since it already has that from the Course row.

EDIT(more detail): PlayOrm has 3 index tables Decimal(stores double, float, etc and BigDecimal), Integer(long, short, etc and BigInteger and boolean), and String index tables. When you use CursorToMany, it uses one of those tables depending on the FK type of key. It also uses those tables for it's Scalable-SQL language. The reason it uses a separate row on CursorToMany is just so clients don't get OutOfMemory on reading a row in as the toMany could have one million FK's in it in some cases. CursorToMany then reads in batches from that index row.

later, Dean

来源：https://stackoverflow.com/questions/12591660/cassandra-schema-design

标签

nosql

cassandra

hector