large-data-volumes

Designing a web crawler

半城伤御伤魂 提交于 2019-11-27 04:09:24
问题 I have come across an interview question "If you were designing a web crawler, how would you avoid getting into infinite loops? " and I am trying to answer it. How does it all begin from the beginning. Say Google started with some hub pages say hundreds of them (How these hub pages were found in the first place is a different sub-question). As Google follows links from a page and so on, does it keep making a hash table to make sure that it doesn't follow the earlier visited pages. What if the

Transferring large payloads of data (Serialized Objects) using wsHttp in WCF with message security

Deadly 提交于 2019-11-27 00:31:09
问题 I have a case where I need to transfer large amounts of serialized object graphs (via NetDataContractSerializer) using WCF using wsHttp. I'm using message security and would like to continue to do so. Using this setup I would like to transfer serialized object graph which can sometimes approach around 300MB or so but when I try to do so I've started seeing a exception of type System.InsufficientMemoryException appear. After a little research it appears that by default in WCF that a result to

How to do page navigation for many, many pages? Logarithmic page navigation

浪尽此生 提交于 2019-11-26 21:59:41
What's the best way of displaying page navigation for many, many pages? (Initially this was posted as a how-to tip with my answer included in the question. I've now split my answer off into the "answers" section below). To be more specific: Suppose you're displaying a set of records to the user, broken up into fixed-size pages (like the results of a Google search, for example). If there are only a few pages, you can display a page navigation area at the end of the results that might look like this: [ << ] [<] 1 2 3 4 5 6 7 8 9 10 11 12 13 [ > ] [ >> ] But this quickly becomes unweildy if there

Using Hibernate&#39;s ScrollableResults to slowly read 90 million records

女生的网名这么多〃 提交于 2019-11-26 15:11:47
I simply need to read each row in a table in my MySQL database using Hibernate and write a file based on it. But there are 90 million rows and they are pretty big. So it seemed like the following would be appropriate: ScrollableResults results = session.createQuery("SELECT person FROM Person person") .setReadOnly(true).setCacheable(false).scroll(ScrollMode.FORWARD_ONLY); while (results.next()) storeInFile(results.get()[0]); The problem is the above will try and load all 90 million rows into RAM before moving on to the while loop... and that will kill my memory with OutOfMemoryError: Java heap

Is it possible to change argv or do I need to create an adjusted copy of it?

老子叫甜甜 提交于 2019-11-26 09:49:42
问题 My application has potentially a huge number of arguments passed in and I want to avoid the memory of hit duplicating the arguments into a filtered list. I would like to filter them in place but I am pretty sure that messing with argv array itself, or any of the data it points to, is probably not advisable. Any suggestions? 回答1: Once argv has been passed into the main method, you can treat it like any other C array - change it in place as you like, just be aware of what you're doing with it.

How to do page navigation for many, many pages? Logarithmic page navigation

夙愿已清 提交于 2019-11-26 09:07:47
问题 What\'s the best way of displaying page navigation for many, many pages? (Initially this was posted as a how-to tip with my answer included in the question. I\'ve now split my answer off into the \"answers\" section below). To be more specific: Suppose you\'re displaying a set of records to the user, broken up into fixed-size pages (like the results of a Google search, for example). If there are only a few pages, you can display a page navigation area at the end of the results that might look

Using Hibernate&#39;s ScrollableResults to slowly read 90 million records

别等时光非礼了梦想. 提交于 2019-11-26 04:13:03
问题 I simply need to read each row in a table in my MySQL database using Hibernate and write a file based on it. But there are 90 million rows and they are pretty big. So it seemed like the following would be appropriate: ScrollableResults results = session.createQuery(\"SELECT person FROM Person person\") .setReadOnly(true).setCacheable(false).scroll(ScrollMode.FORWARD_ONLY); while (results.next()) storeInFile(results.get()[0]); The problem is the above will try and load all 90 million rows into