为期五天的 Spark Summit North America 2020在美国时间 2020-06-22 ~ 06-26 举行。由于今年新冠肺炎的影响,本次会议第一次以线上的形式进行。这次会议虽然是五天,但是前两天是培训,后面三天才是正式会议。本次会议一共有超过210个议题,一如既往,主题也主要是 Spark + AI,在 AI 方面会议还深入讨论一些流行的软件框架,如 Delta Lake、MLflow、TensorFlow、SciKit-Learn、Keras、PyTorch、DeepLearning4J、BigDL 和 deep learning pipeline等。会议的全部日程请参见:https://databricks.com/sparkaisummit/north-america-2020/agenda
这次会议带来了几点比较重要消息:数砖收购 Redash 公司,发布 Delta Engine等,不过目前 KeyNote 会议的 PPT 还没有发布,感兴趣的可以看下相关视频。过往记忆大数据也在前几天发了几篇这次会议 KeyNote 的介绍,感兴趣的同学可以看这里。另外,在接下来的几天,本公众号也会对一些比较有意思的议题进行介绍,敬请关注本公众号。
本次会议的议题范围具体如下:
- Apache Spark™, Delta Lake, MLflow 以及 Koalas 未来规划;
- 管理机器学习生命周期的最佳实践
- 构建大规模可靠数据管道的技巧
- 流行的深度学习和机器学习框架的最新发展
- 真实的 AI 用户案例
下载途径
关注微信公众号 过往记忆大数据 或者 Java技术范 并回复 spark-9832 获取。
可下载的PPT
下面议题提供 PPT 下载
- Data Science Across Data Sources with Apache Arrow
- Portable Scalable Data Visualization Techniques for Apache Spark and Python Notebook-based Analytics
- Native Support of Prometheus Monitoring in Apache Spark 3.0
- Performant Streaming in Production: Preventing Common Pitfalls when Productionizing Streaming Jobs
- Scaling Security Threat Detection with Apache Spark and Databricks
- User Defined Aggregation in Apache Spark: A Love Story
- Powering Interactive BI Analytics with Presto and Delta Lake
- Using AI to Support Proliferating Merchant Changes
- Tuning ML Models: Scaling, Workflows, and Architecture
- Battling Model Decay with Deep Learning and Gamification
- An Approach to Data Quality for Netflix Personalization Systems
- High-Performance Analytics with Probabilistic Data Structures: the Power of HyperLogLog
- Preventing Abuse Using Unsupervised Learning
- Geospatial Analytics at Scale: Analyzing Human Movement Patterns During a Pandemic
- Leveraging Apache Spark for Scalable Data Prep and Inference in Deep Learning
- Filtering vs Enriching Data in Apache Spark
- Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
- Deep Dive into GPU Support in Apache Spark 3.x
- Sputnik: Airbnb’s Apache Spark Framework for Data Engineering
- Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts
- Automated and Explainable Deep Learning for Clinical Language Understanding at Roche
- Building Understanding Out of Incomplete and Biased Datasets using Machine Learning and Databricks
- Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA and Governance
- Managing ADLS gen2 using Apache Spark
- Using Apache Spark and Differential Privacy for Protecting the Privacy of the 2020 Census Respondents
- The 2020 Census and Innovation in Surveys
- scaling-data-and-ml-with-apache-spark-and-feast
- The Apache Spark File Format Ecosystem
- Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL Pipeline
- A Production Quality Sketching Library for the Analysis of Big Data
- Children Safety Retrieval (CENSER) System for Retrieval of Kidnapped Children from Brothels in India
- Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner Enabled Apache Spark Clusters
- Scalable AutoML for Time Series Forecasting using Ray
- Using Machine Learning to Evolve Sports Entertainment
- Using Bayesian Generative Models with Apache Spark to Solve Entity Resolution Problems (DeDup, Merging, Uniqueness) at Scale
- Fine Tuning and Enhancing Performance of Apache Spark Jobs
- All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databricks) - A Real World Case Study
- Running Apache Spark on Kubernetes: Best Practices and Pitfalls
- Lessons Learned from Modernizing USCIS Data Analytics Platform
- On Improving Broadcast Joins in Apache Spark SQL
- Using Databricks as an Analysis Platform
- Is This Thing On? A Well State Model for the People
- Advanced Natural Language Processing with Apache Spark NLP
- Building a Streaming Microservice Architecture: with Apache Spark Structured Streaming and Friends
- Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
- Deploying Apache Spark Jobs on Kubernetes with Helm and Spark Operator
- Resource-Efficient Deep Learning Model Selection on Apache Spark
- Bring Satellite and Drone Imagery into your Data Science Workflows
- Scoring at Scale: Generating Follow Recommendations for Over 690 Million LinkedIn Members
- From HDFS to S3: Migrate Pinterest Apache Spark Clusters
- SparkCruise: Automatic Computation Reuse in Apache Spark
- Chromatic Sparse Learning
- Deploy and Serve Model from Azure Databricks onto Azure Machine Learning
- Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
- The Revolution Will be Streamed
- Democratizing PySpark for Mobile Game Publishing
- Ray: Enterprise-Grade, Distributed Python
- Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data Analytics
- Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
- Scaling Up AI Research to Production with PyTorch and MLFlow
- Best Practices for Building Robust Data Platform with Apache Spark and Delta
- Building a Pipeline for State-of-the-Art Natural Language Processing Using Hugging Face Tools
- Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
- Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
- Flash for Apache Spark Shuffle with Cosco
- Building a Real-Time Feature Store at iFood
转载本文请加上:转载自过往记忆(https://www.iteblog.com/)
本文链接: 【Spark Summit North America 202006 高清 PPT 下载】(https://www.iteblog.com/archives/9832.html)
来源:oschina
链接:https://my.oschina.net/u/4302345/blog/4335407