问题
As of now i am iterating through all the 5k files available in the folder and store them in a tbufferoutput and read through them by using tbufferinput and sorting them based on mtime desc(modified time in the ftp site) in the descending order and extract the top 10 files only.
Since its iterating through all the 5k files at once its time consuming and causing unnecessary latency issues with the remote ftp site.
i was wondering if there is any other simple way without iterating just get the latest top 10 files from the ftp site directly and sort them based on mtime desc and perform operations with them?
My talend job flow looks like this at the moment,would advise any other methods that could optimize the performance of the job in a much better way!
Basically i dont want to iterate and run through all the files in the ftp site,instead directly get the top 10 from the remote ftp :tftpfilelist and perform checks in db and download them later
IS THERE ANYWAY WITHOUT ITERATING ,CAN I JUST GET THE LATEST 10 FILES just by using modified timestamp in desc order alone?-This is the question in short OR I want to extract the LAST 3 days files from the remote ftp site.
Filename is in this format:A_B_C_D_E_20200926053617.csv
Approach B:WITH JAVA, I tried using the tjava code as below: for the flow B:
Date lastModifiedDate = TalendDate.parseDate("EEE MMM dd HH:mm:ss zzz yyyy", row2.mtime_string);
Date current_date = TalendDate.getCurrentDate();
System.out.println(lastModifiedDate);
System.out.println(current_date);
System.out.println(((String)globalMap.get("tFTPFileList_1_CURRENT_FILE")));
if(TalendDate.diffDate(current_date, lastModifiedDate,"dd") <= 1) {
System.out.println
output_row.abs_path = input_row.abs_path;
System.out.println(output_row.abs_path);
}
Now the tlogrow3 is printing NULL values all over,please suggest
回答1:
Define 3 context variables :
in tJava, compute the mask (with wildcard) for the 3 days (starting at the current date) :
Date currentDate = TalendDate.getCurrentDate();
Date currentDateMinus1 = TalendDate.addDate(currentDate, -1, "dd");
Date currentDateMinus2 = TalendDate.addDate(currentDate, -2, "dd");
context.mask1 ="*" + TalendDate.formatDate("yyyyMMdd", currentDate) + "*.csv";
context.mask2 ="*" + TalendDate.formatDate("yyyyMMdd", currentDateMinus1) + "*.csv";
context.mask3 ="*" + TalendDate.formatDate("yyyyMMdd", currentDateMinus2) + "*.csv";
then in the tFTPFileList, use the 3 context variables for filemask :
to retrieve the files only from today and the 2 previous day.
来源:https://stackoverflow.com/questions/64258690/how-to-just-extract-the-last-2-days-recent-files-from-tftpfilelist-based-on-modi