large-data-volumes

Designing a web crawler

阅读更多关于 Designing a web crawler

问题 I have come across an interview question "If you were designing a web crawler, how would you avoid getting into infinite loops? " and I am trying to answer it. How does it all begin from the beginning. Say Google started with some hub pages say hundreds of them (How these hub pages were found in the first place is a different sub-question). As Google follows links from a page and so on, does it keep making a hash table to make sure that it doesn't follow the earlier visited pages. What if the

Transferring large payloads of data (Serialized Objects) using wsHttp in WCF with message security

阅读更多关于 Transferring large payloads of data (Serialized Objects) using wsHttp in WCF with message security

问题 I have a case where I need to transfer large amounts of serialized object graphs (via NetDataContractSerializer) using WCF using wsHttp. I'm using message security and would like to continue to do so. Using this setup I would like to transfer serialized object graph which can sometimes approach around 300MB or so but when I try to do so I've started seeing a exception of type System.InsufficientMemoryException appear. After a little research it appears that by default in WCF that a result to

How to do page navigation for many, many pages? Logarithmic page navigation

阅读更多关于 How to do page navigation for many, many pages? Logarithmic page navigation

What's the best way of displaying page navigation for many, many pages? (Initially this was posted as a how-to tip with my answer included in the question. I've now split my answer off into the "answers" section below). To be more specific: Suppose you're displaying a set of records to the user, broken up into fixed-size pages (like the results of a Google search, for example). If there are only a few pages, you can display a page navigation area at the end of the results that might look like this: [ << ] [<] 1 2 3 4 5 6 7 8 9 10 11 12 13 [ > ] [ >> ] But this quickly becomes unweildy if there

Using Hibernate's ScrollableResults to slowly read 90 million records

阅读更多关于 Using Hibernate's ScrollableResults to slowly read 90 million records

I simply need to read each row in a table in my MySQL database using Hibernate and write a file based on it. But there are 90 million rows and they are pretty big. So it seemed like the following would be appropriate: ScrollableResults results = session.createQuery("SELECT person FROM Person person") .setReadOnly(true).setCacheable(false).scroll(ScrollMode.FORWARD_ONLY); while (results.next()) storeInFile(results.get()[0]); The problem is the above will try and load all 90 million rows into RAM before moving on to the while loop... and that will kill my memory with OutOfMemoryError: Java heap

Is it possible to change argv or do I need to create an adjusted copy of it?

阅读更多关于 Is it possible to change argv or do I need to create an adjusted copy of it?

问题 My application has potentially a huge number of arguments passed in and I want to avoid the memory of hit duplicating the arguments into a filtered list. I would like to filter them in place but I am pretty sure that messing with argv array itself, or any of the data it points to, is probably not advisable. Any suggestions? 回答1: Once argv has been passed into the main method, you can treat it like any other C array - change it in place as you like, just be aware of what you're doing with it.

How to do page navigation for many, many pages? Logarithmic page navigation

阅读更多关于 How to do page navigation for many, many pages? Logarithmic page navigation

问题 What\'s the best way of displaying page navigation for many, many pages? (Initially this was posted as a how-to tip with my answer included in the question. I\'ve now split my answer off into the \"answers\" section below). To be more specific: Suppose you\'re displaying a set of records to the user, broken up into fixed-size pages (like the results of a Google search, for example). If there are only a few pages, you can display a page navigation area at the end of the results that might look

Using Hibernate's ScrollableResults to slowly read 90 million records

阅读更多关于 Using Hibernate's ScrollableResults to slowly read 90 million records

问题 I simply need to read each row in a table in my MySQL database using Hibernate and write a file based on it. But there are 90 million rows and they are pretty big. So it seemed like the following would be appropriate: ScrollableResults results = session.createQuery(\"SELECT person FROM Person person\") .setReadOnly(true).setCacheable(false).scroll(ScrollMode.FORWARD_ONLY); while (results.next()) storeInFile(results.get()[0]); The problem is the above will try and load all 90 million rows into

Designing a web crawler

Transferring large payloads of data (Serialized Objects) using wsHttp in WCF with message security

How to do page navigation for many, many pages? Logarithmic page navigation

Using Hibernate&#39;s ScrollableResults to slowly read 90 million records

Is it possible to change argv or do I need to create an adjusted copy of it?

How to do page navigation for many, many pages? Logarithmic page navigation

Using Hibernate&#39;s ScrollableResults to slowly read 90 million records

Using Hibernate's ScrollableResults to slowly read 90 million records

Using Hibernate's ScrollableResults to slowly read 90 million records