user-defined-functions

Hive UDF Text to array

妖精的绣舞 提交于 2020-01-10 11:47:50
问题 I'm trying to create some UDF for Hive which is giving me some more functionality than the already provided split() function. import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; public class LowerCase extends UDF { public Text evaluate(final Text text) { return new Text(stemWord(text.toString())); } /** * Stems words to normal form. * * @param word * @return Stemmed word. */ private String stemWord(String word) { word = word.toLowerCase(); // Remove special characters

Hive UDF Text to array

泪湿孤枕 提交于 2020-01-10 11:47:04
问题 I'm trying to create some UDF for Hive which is giving me some more functionality than the already provided split() function. import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; public class LowerCase extends UDF { public Text evaluate(final Text text) { return new Text(stemWord(text.toString())); } /** * Stems words to normal form. * * @param word * @return Stemmed word. */ private String stemWord(String word) { word = word.toLowerCase(); // Remove special characters

Exponentiation of real numbers

纵饮孤独 提交于 2020-01-07 05:43:22
问题 I've come across an interesting exercise and it says: Implement a function x^y using standard functions of Turbo Pascal For integer variables I can use for loop but I cannot understand how to work with real variables in this case. I've been thinking about how to do this using Taylor series (can't understand how to use it for exponentiation) and I also found out that x^y = exp(y*log(x)) but there is only ln (natural logarithm) in standard functions... PS I'm not asking you to write code: give

PySpark : KeyError when converting a DataFrame column of String type to Double

生来就可爱ヽ(ⅴ<●) 提交于 2020-01-07 03:00:16
问题 I'm trying to learn machine learning with PySpark . I have a dataset that has a couple of String columns which have either True or False or Yes or No as its value. I'm working with DecisionTree and I wanted to convert these String values to corresponding Double values i.e. True, Yes should change to 1.0 and False, No should change to 0.0 . I saw a tutorial where they did the same thing and I came up with this code df = sqlContext.read.csv("C:/../churn-bigml-20.csv",inferSchema=True,header

VB.NET How to make a custom system time (real-time) [closed]

橙三吉。 提交于 2020-01-06 11:57:26
问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 4 years ago . So Im stuck here on my project in VB.Net . I wanted to make a user defined date time function that is not dependent on the system time Ive tried to search on the net but they only gave me how to print current system time. Any idea guys how to make a customized date time updating every seconds in

VB.NET How to make a custom system time (real-time) [closed]

落花浮王杯 提交于 2020-01-06 11:56:25
问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 4 years ago . So Im stuck here on my project in VB.Net . I wanted to make a user defined date time function that is not dependent on the system time Ive tried to search on the net but they only gave me how to print current system time. Any idea guys how to make a customized date time updating every seconds in

VB.NET How to make a custom system time (real-time) [closed]

不打扰是莪最后的温柔 提交于 2020-01-06 11:56:25
问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 4 years ago . So Im stuck here on my project in VB.Net . I wanted to make a user defined date time function that is not dependent on the system time Ive tried to search on the net but they only gave me how to print current system time. Any idea guys how to make a customized date time updating every seconds in

Fetch all the duplicate records from a list without quadratic time complexity?

馋奶兔 提交于 2020-01-06 05:44:13
问题 Below is my list which contains the column names: country, gender & age. scala> funList res1: List[(String, String, String)] = List((india,M,15), (usa,F,25), (australia,M,35), (kenya,M,55), (russia,M,75), (china,T,95), (england,F,65), (germany,F,25), (finland,M,45), (australia,F,35)) My goal is to find the duplicate records with the combination of (country,age). Please note that I want to only fetch all the duplicate records and ignore others. And list should also contain other column values

Spark dataframe - Replace tokens of a common string with column values for each row using scala

。_饼干妹妹 提交于 2020-01-06 05:24:31
问题 I have a dataframe with 3 columns - number (Integer), Name (String), Color (String). Below is the result of df.show with repartition option. val df = sparkSession.read.format("csv").option("header", "true").option("inferschema", "true").option("delimiter", ",").option("decoding", "utf8").load(fileName).repartition(5).toDF() +------+------+------+ |Number| Name| Color| +------+------+------+ | 4|Orange|Orange| | 3| Apple| Green| | 1| Apple| Red| | 2|Banana|Yellow| | 5| Apple| Red| +------+----

How to call an extended procedure from a function

吃可爱长大的小学妹 提交于 2020-01-05 13:32:03
问题 hi im having trouble trying to get the following function to work. CREATE FUNCTION test ( @nt_group VARCHAR(128) ) RETURNS @nt_usr TABLE ( [name] [nchar](128) NULL , [type] [char](8) NULL , [privilege] [char](9) NULL , [mapped login name] [nchar](128) NULL , [permission path] [nchar](128) NULL ) AS BEGIN INSERT INTO @nt_usr EXEC master.dbo.xp_logininfo 'DOMAIN\USER', @nt_group RETURN END As far as i know i should be allowed to call an extended stored procedure, im getting the following error