问题
I have a list of company names that I would like to turn into tickers. Here is the reproducible code to create the list of names that I have:
companynames=structure(list(V1 = structure(1:41, .Label = c("AETNA INC", "ANTHEM INC",
"APPLE INC", "ASPEN INSURANCE HOLDINGS LTD", "BARRICK GOLD CORP",
"BEST BUY CO INC", "CAREFUSION CORP", "CBS CORP-CLASS B NON VOTING",
"CIGNA CORP", "COMPUTER SCIENCES CORP", "COMPUWARE CORP", "COVENTRY HEALTH CARE INC",
"DELPHI AUTOMOTIVE PLC", "DST SYSTEMS INC", "EINSTEIN NOAH RESTAURANT GRO",
"ENSCO PLC-CL A", "EXPEDIA INC", "FIFTH STREET FINANCE CORP",
"GENERAL MOTORS CO", "GENWORTH FINANCIAL INC-CL A", "GREEN BRICK PARTNERS INC",
"HESS CORP", "HUMANA INC", "HUNTINGTON INGALLS INDUSTRIE", "LEGG MASON INC",
"MARKET VECTORS GOLD MINERS", "MARVELL TECHNOLOGY GROUP LTD",
"MICROSOFT CORP", "NCR CORPORATION", "NVR INC", "OAKTREE CAPITAL GROUP LLC",
"REPUBLIC AIRWAYS HOLDINGS IN", "SEAGATE TECHNOLOGY", "SPRINT COMMUNICATIONS INC",
"STARZ - A", "STATE BANK FINANCIAL CORP", "SYMMETRICOM INC",
"TESSERA TECHNOLOGIES INC", "UNITEDHEALTH GROUP INC", "VIRGIN MEDIA INC/OLD",
"XEROX CORP"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA,
-41L))
This gives me something along the lines of:
head(companynames)
V1
1 AETNA INC
2 ANTHEM INC
3 APPLE INC
4 ASPEN INSURANCE HOLDINGS LTD
5 BARRICK GOLD CORP
6 BEST BUY CO INC
I would like another column that outputed the tickers of each of these companies. So for the first row I should get AET, second row would be ATHN, and third row would be AAPL, etc. My example is in R, but any solution in python or R would be very helpful. I am not sure if there is already a function that does it or how the best approach would be to create a function if it does not exist.
回答1:
You can use @Joshual Ulrich's TTR
package to get a mapping of company names to tickers and perform lookups against your companynames
object. Ideally, your list of names would be accurate / properly formatted, but since it's not you will have to do a bit of extra leg work to get some of the symbols. For example,
stock.symbols <- TTR::stockSymbols()
stock.symbols$adj_name <- gsub("[\\.\\,]", "", toupper(stock.symbols$Name)) # quick adjustments
##
companynames$Symbol <- sapply(companynames[,1], function(x) {
stock.symbols[grep(x, stock.symbols$adj_name)[1], 1]
})
##
R> na.omit(companynames)
# V1 Symbol
#1 AETNA INC AET
#2 ANTHEM INC ANTM
#3 APPLE INC AAPL
#5 BARRICK GOLD CORP ABX
#6 BEST BUY CO INC BBY
#9 CIGNA CORP CI
#10 COMPUTER SCIENCES CORP CSC
#13 DELPHI AUTOMOTIVE PLC DLPH
#14 DST SYSTEMS INC DST
#17 EXPEDIA INC EXPE
#18 FIFTH STREET FINANCE CORP FSC
#19 GENERAL MOTORS CO GM
#21 GREEN BRICK PARTNERS INC GRBK
#22 HESS CORP HES
#23 HUMANA INC HUM
#24 HUNTINGTON INGALLS INDUSTRIE HII
#25 LEGG MASON INC LM
#27 MARVELL TECHNOLOGY GROUP LTD MRVL
#28 MICROSOFT CORP MSFT
#29 NCR CORPORATION NCR
#30 NVR INC NVR
#31 OAKTREE CAPITAL GROUP LLC OAK
#32 REPUBLIC AIRWAYS HOLDINGS IN RJET
#33 SEAGATE TECHNOLOGY STX
#36 STATE BANK FINANCIAL CORP STBZ
#38 TESSERA TECHNOLOGIES INC TSRA
#39 UNITEDHEALTH GROUP INC UNH
#41 XEROX CORP XRX
So just using a few basic transformations (setting the Names
column to uppercase and removing .
s and ,
s), you can match 28 out of 41 of the inputs. Most of the remaining non-matching cases could probably be solved by simple substitutions of either your input names or the adj_names
column in stock.symbols
, e.g. CORP
vs CORPORATION
, etc... And as pointed out in the comments above, if you have company names that aren't traded on any of the NASDAQ
, AMEX
, or NYSE
exchanges, you will have to pull in some more external data.
来源:https://stackoverflow.com/questions/32383585/turn-list-of-company-names-into-tickers