Cassandra + PHP + Thrift + retrieve multiple rows bad performance

久未见 提交于 2019-12-22 18:38:50

问题


I'm new at Cassandra, and I'm trying to recover multiple rows using php, but the performance is really pour.

Here is the code I'm using:

*

<?php
$GLOBALS['THRIFT_ROOT'] = 'D:/cassandra/thrift/lib/php/src';
require_once $GLOBALS['THRIFT_ROOT'].'/packages/cassandra/Cassandra.php';
require_once $GLOBALS['THRIFT_ROOT'].'/packages/cassandra/cassandra_types.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TSocket.php';
require_once $GLOBALS['THRIFT_ROOT'].'/protocol/TBinaryProtocol.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TFramedTransport.php';
require_once $GLOBALS['THRIFT_ROOT'].'/transport/TBufferedTransport.php';
try {

    $ipmachine = 'localhost';
    $keyspace = 'demo';
    $field_search = 'id_log';
    $column_family = 'logs';
    // Make a connection to the Thrift interface to Cassandra
    $socket = new TSocket($ipmachine, 9160);
    $transport = new TFramedTransport($socket, 1024, 1024);
    $protocol = new TBinaryProtocol($transport);
    $client = new cassandra_CassandraClient($protocol);
    $transport->open();

    $consistency_level = ConsistencyLevel::ONE;

    $client->set_keyspace($keyspace);

    // Specify what Column Family to query against.
    $columnParent = new cassandra_ColumnParent();
    $columnParent->column_family = $column_family;
    $columnParent->super_column = NULL;
    $sliceRange = new cassandra_SliceRange();
    $sliceRange->start = "";
    $sliceRange->finish = "";    

    $predicate = new cassandra_SlicePredicate();
    $predicate->slice_range = $sliceRange;
    $numelements = 100;

    $keyRange = new cassandra_KeyRange();
    $keyRange->start_key= "";
    $keyRange->end_key = "";
    $keyRange->count =$numelements;

    $result = $client->get_range_slices($columnParent, $predicate, $keyRange, $consistency_level);

    if(!empty($result)){
    $continue = 1;
        $start_key = 1;
    while ($continue <=5){

            $keyRange = new cassandra_KeyRange();
        $keyRange->start_key= $start_key;
        $keyRange->end_key = "";
        $keyRange->count =$numelements;

        $t = microtime(true);
        $micro = sprintf("%06d",($t - floor($t)) * 1000000);
        $d1 = new DateTime( date('Y-m-d H:i:s.'.$micro,$t) );
        $now = $d1->format("H:i:s.u");

        echo $now .'................';  
        $result = $client->get_range_slices($columnParent, $predicate, $keyRange, $consistency_level);
        $t = microtime(true);
        $micro = sprintf("%06d",($t - floor($t)) * 1000000);
        $d2 = new DateTime( date('Y-m-d H:i:s.'.$micro,$t) );
        $now = $d2->format("H:i:s.u");

        echo $now . '<br>';
        // DO SOMETHING WITH THE DATA AND CHANGE 
            $start_key = $start_key * $numelements;
        $continue++;

    }
   }    

    $transport->close();

} catch (TException $tx) {
   print 'TException: '.$tx->getLine(). '<br>Error: '.$tx->getMessage();
   print '<br>Code '.$tx->getCode(). '<br>traza: '.$tx->getTraceAsString();
}
?>

*

As a result, this is what I get

Init Time End Time
19:13:39.534957................19:13:40.220973
19:13:40.221050................19:13:40.892968
19:13:40.893044................19:13:41.575102
19:13:41.575181................19:13:42.256830
19:13:42.256906................19:13:42.936492

So to recover 5 blocks of 100 rows it took 3 seconds.

How could I improve the performance? Is there any other way of recovering data from Cassandra using thrift instead of using get_range_slices?

I have also tried to use a bigger counter, instead of 100 elements, but it takes more or less the same time.

I need to recover more than 100.000 rows, so as you can imagine the progression is horrible.

来源:https://stackoverflow.com/questions/10724881/cassandra-php-thrift-retrieve-multiple-rows-bad-performance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!