Connect with us

News

Open-source language AI challenges big tech’s models

Published

on

An international team of around 1,000 largely academic volunteers has tried to break big tech’s stranglehold on natural-language processing and reduce its harms. Trained with US$7-million-worth of publicly funded computing time, the BLOOM language model will rival in scale those made by firms Google and OpenAI, but will be open-source. BLOOM will also be the first model of its scale to be multilingual.

The collaboration, called BigScience, launched an early version of the model on 17 June, and hopes that it will ultimately help to reduce harmful outputs of artificial intelligence (AI) language systems. Models that recognize and generate language are increasingly used by big tech firms in applications from chat bots to translators, and can sound so eerily human that a Google engineer this month claimed that the firm’s AI model was sentient (Google strongly denies that the AI possesses sentience). But such models also suffer from serious practical and ethical flaws, such as parroting human biases. These are difficult to tackle because the inner workings of most such models are closed to researchers.

As well being a tool to explore AI, BLOOM will be open for a range of research uses, such as extracting information from historical texts and making classifications in biology. “We think that access to the model is an essential step to do responsible machine learning,” says Thomas Wolf, co-founder of Hugging Face, a company that hosts an open-source platform for AI models and data sets, and has helped to spearhead the initiative.

“It was long overdue that this technology diffused into the open-source world, and this is quite an interesting way for it to have happened,” says Connor Leahy, co-founder of EleutherAI, which is creating its own open-source large language model in English and was not involved in the project.

Learning machines

Large language models are algorithms that learn statistical associations between billions of words and phrases to perform tasks such as generating summaries, translating, answering questions and classifying text. Built using brain-inspired architectures known as neural networks, the models train through adjusting values, called parameters, by blanking out words and comparing their predictions with reality. BLOOM has 176 billion parameters, on a par with GPT-3, one of the best-known such models, which was created by the non-profit firm OpenAI and licensed by Microsoft.

Although such models are sometimes impressive — generating poetry or correctly answering trivia questions — they have no sense of the meaning of language, which causes them to also create gibberish. More worryingly, they can also promote abuse or self-harm, and echo existing racist or sexist associations that are sewn throughout the human-written text they learn on, such as linking ‘Islam’ with terrorism. The models generally cost millions of dollars to train and have an enormous carbon footprint (BigScience eventually plans to reveal its carbon emissions).

Whereas most natural-language models are built by small in-house teams, BLOOM was the work of hundreds of researchers — mostly academics — including ethicists, legal scholars and philosophers, but also some employees from Facebook and Google, working in a personal capacity. To train BLOOM, BigScience was granted free access to France’s national Jean Zay supercomputer facility outside Paris. The model is currently in the last few weeks of its three-month training period.

Hand-picked text

Models are only as good as the data sets they are based on, so a major task was selecting what texts the model should learn from, says Yacine Jernite, a machine-learning researcher at Hugging Face. Most major models rip language directly from the web, including sites such as Reddit. Instead, the BigScience researchers hand-picked nearly two-thirds of their 341-billion-word data set from 500 sources. Among them was Semantic Scholar, an AI-backed search engine for academic publications that also includes content such as Nature news articles. The sources were suggested during a series of workshops, including with community groups, such as the African natural-language-processing community Masakhane, LatinX in AI and Machine Learning Tokyo. “We wanted to make sure people with proximity to the data, their country, the language they speak, had a hand in choosing what language came into the model’s training,” says Jernite.

To make full use of the computing power available, the team topped up the data trove using a multilingual web crawl, filtered for quality and with some redaction for privacy. The collaboration also attempted to reduce the usual over-representation of porn sites (which can lead to sexist associations in the model) but without excluding keywords that would remove content associated with frank discussion of sexuality in often under-represented communities.

Jernite acknowledges that BLOOM will not be free of biases. But by providing it with multicultural and high-quality sources, the team hopes to improve on existing models. Crucially, because the code and data set behind the model are open, researchers can try to understand the roots of harmful behaviours, which could improve future iterations, says Wolf.

Evaluation of the model will also differ from the usual benchmarks, says Ellie Pavlick, a natural-language-learning researcher at Brown University in Providence, Rhode Island. As well as comparing BLOOM against other models in its abilities to, for example, answer questions, researchers also want to look at more diverse metrics, such as how strongly it makes certain stereotyped associations or how biased its abilities are towards a specific language. Pavlick hopes that because the model has been trained to be multilingual, it might have a deeper understanding of language, which could help in its ability to generalize to a diversity of tasks.

Leahy predicts that the model might perform slightly worse than other large models in English, given its smaller data set in the language, but that should be balanced by markedly better performance elsewhere.

Free to use

The fully trained BLOOM model will be available to download for researchers who want to experiment with it or train it on new data for specific applications. But downloading it and running it requires significant hardware capacity. Because that’s available to so few research teams, BigScience will also publish smaller, less hardware-intensive versions as well as create a distributed system that allows labs to share the model across their servers. In addition, Hugging Face will release a web application that will enable anyone to query BLOOM without downloading it. A similar application will be available for the early release later this week.

BLOOM could find uses in research outside AI. Francesco de Toni, a linguist at the University of Western Australia in Perth, jointly leads a BigScience working group that is looking at using models to extract information from collections of historical texts that are too large to go through by hand. Models can, for example, extract all the names or goods mentioned in a collection of letters by Renaissance merchants — information that would be impossible to find using a search engine.

BLOOM comes with documentation that outlines its capabilities and limitations. Using it also requires signing up to an evolving legal licence that commits researchers to not use the model for malicious or inappropriate ends, such as generating fake news. The collaboration will monitor how the model is applied and adjust the license and documentation as necessary, says Giada Pistilli, an ethicist at Hugging Face and philosopher at the Sorbonne University in Paris who co-chaired BigScience’s ethical and legal working group. “It’s really hard to imagine and predict all the uses,” she says.

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

News

4 Important Tips for Having a Vacation Abroad

Published

on

4 Important Tips for Having a Vacation Abroad

Are you planning to go abroad but still don’t know what to prepare? People dream of going abroad, especially to countries like America and Europe. If this is your first time going abroad, you should check the following tips!

Prepare All Important Documents

The first thing you need to do is prepare important documents. For example, passports, ID cards, visas, and international driving licenses if you are going to drive abroad. Make sure you know whether the country you are going to visit is visa-free or not. For Southeast Asian countries, the Maldives and Turkey are visa-free, so you only have to have a passport. But a visa is still needed if you want to go to South Korea, Europe, or America. Make sure to scan your document and save it in the cloud like Google Drive or iCloud. Oh, yes, remember to check your vaccination status. Because every country needs your health information.

Make Itineraries

Itinerary is important for those who want to travel abroad. The reason is holidays abroad cost a lot of money, so when you can, take advantage of it with a well-planned schedule. Research in detail the tourist destinations you want to visit. For example, what is unique in it, ticket prices, transportation to get there, to the distance from the inn you’re staying. Remember to include places to eat that you want to try. Make sure the place to eat is according to your preferences, such as halal or free of certain food allergies.

Book Tickets in Advance

When you know how long you will be on vacation with the itinerary that has been prepared, it’s time to book plane tickets and lodging. Find cheap tickets by:

  1. Using promos and discounts on travel agent applications.
  2. Comparing which price is lower and what kind of facilities you will get.
  3. Choosing accommodation that fits your budget but is still comfortable.

Oh yes, also remember to check how the pandemic situation is in the country you are going to visit. Do you have to quarantine or not? Because it will affect your itinerary and accommodation. Due to the pandemic conditions that have not fully recovered, check whether there is still Indonesia quarantine after returning from vacation.

Exchange Money and Check Your ATM Cards

Exchange your currency into the destination country’s currency, for example, yen, euros, dollars, won, and others. But remember, don’t carry too much cash because it’s also prone to theft, besides being wasteful. For the rest, you can do cashless transactions. Check your bank’s ATM card to see if it has Visa, MasterCard, or Cirrus logos. This row of stamps indicates that your bank is working with banks abroad. Or you can also use a credit card to make your transaction easier.

Continue Reading

News

Down 43%, Is This Tech Stock Worth Buying Right Now?

Published

on

Down 43%, Is This Tech Stock Worth Buying Right Now?

Skyworks Solutions (NASDAQ: SWKS) announced its fiscal 2022 fourth-quarter results (for the three months ended September 30) on November 3, and the supplier Apple’s stock price has risen 11% since then.

Skyworks beat expectations and showed solid growth at a time when smartphone sales were declining, but forecasts show the chipmaker is about to hit a bump. With that said, let’s take a closer look at the latest results from the chipmaker. Let’s take a closer look at whether the stock can sustain new momentum after losing 43% of its value in 2022.

Skyworks solutions deliver reliable results for non-mobile businesses
Skyworks’ fourth-quarter revenue increased 7% year-over-year to a record $1.4 billion. The company also reported non-GAAP (adjusted) earnings of $3.02 per share, up 15% year-over-year. Skyworks easily justified analyst estimates of $2.91 per share. For the year, the company’s revenue increased 7% to $5.5 billion and earnings rose similarly to $11.24 per share.

The strong growth of chipmakers in the fourth quarter was the result of successful diversification into new markets such as Internet of Things (IoT) and automotive, as well as relationships with major smartphone original equipment manufacturers (OEMs). Yes, it helped make up for it. Weakness in the smartphone market. space. However, it was the non-mobile business that put a lot of effort into Skyworks last quarter.
As CFO Chris Sennesael noted in the report, the company generated $500 million in revenue from broad market segments (counting chip sales for non-mobile applications like IoT), up 30% from the previous year. Last earnings conference call. Broad market companies contributed 36% of Skyworks’ revenue last quarter, up from 29% in the same period last year.

It’s also worth noting that Skyworks earned $2 billion in revenue from this segment for the entire fiscal year. That’s almost 43% more than the $1.4 billion in revenue last fiscal year. The good news is that the company’s business in a wide range of markets can maintain its momentum. This is because, as Skyworks showed in its earnings report, it is attracting new customers in high-growth niches like IoT.

“In IoT, we continue to win new customers and expand our content. We have partnered with Vodafone to launch the UK’s first WiFi 6E platform. We have launched a solution for Fi 6 hotspots.”

Skyworks also enables the deployment of O-RAN (Open Radio Access Network) and delivers record quarterly results in the high-growth automotive business niche. For example, the O-RAN market is expected to grow at an annual rate of 42% until 2030. Meanwhile, according to Mordor Intelligence, the demand for connected cars will grow by 19% per year until 2027.

These catalysts explain why Skyworks expects its broad commercial segment of the market “to be a major driver in FY23 and beyond.”

The mobile business was not in its best last quarter
Skyworks’ mobile business generated approximately $907 million in revenue last quarter (this is total revenue minus $500 million from the broader market business). By comparison, 71% of Skyworks’ $1.31 billion in revenue last year came from its mobile business, worth nearly $931 million.

Thus, the company’s mobile business, which generates most of its revenue, declined year-over-year in the most recent quarter. This is not surprising given that smartphone sales have been declining for the past five quarters. Skyworks considers Apple its biggest client, with the smartphone giant generating 58% of its revenue last year.

Last quarter, Apple shipped 48.5 million smartphones, 6.4% more than last year. However, the overall smartphone market was down 9% year-over-year. And now things could get even worse for Skyworks.

All of this explains why Skyworks management is targeting a sharp drop in sales and profits. The chipmaker expects revenue of $1.3 billion to $1.35 billion and adjusted earnings of $2.59 per share in the first quarter of fiscal 2023. These numbers show double-digit declines in both revenue and earnings compared to the last year.

Continue Reading

News

Tech Shares May Weigh On Taiwan Stock Market

Published

on

Tech Shares May Weigh On Taiwan Stock Market

(RTTNews) – The Taiwanese stock market fell nearly 230 points (1.7%) on Tuesday after falling for two days. The Taiwan Stock Exchange is currently just above the 14,700 plateau, but selling pressure is likely to resume on Wednesday.

The global outlook for Asian markets is mixed, with little change ahead of major economic events that could affect the interest rate outlook. European and US markets were mixed and flat, followed by Asian equities.

The Tokyo Stock Exchange closed sharply higher on Tuesday after gains in financial, technology and cement stocks.

The index closed at 14,709.64, up 152.77 points (1.05%) after trading between 14,449.05 and 14,716.58.
Among assets, Cathay Financial was up 3.45%, Mega Financial was up 1.78%, CTBC Financial was up 2.93%, Fubon Financial was up 2.94%, First Financial was up 1.35%, E Sun Financial rose 1.66%, Taiwanese semiconductor company rose 1.35% and United Microelectronics rose 1.35%. Corporation and Catcher Technology rose 0.56%, Largan Precision shed 0.22%, MediaTek rose 1.42%, Delta Electronics rose 1.71%, Novatek Microelectronics rose 0.51%, China Steel rose 0.51%. 2.87%, Formosa Plastics shed 0.22%, Nan Ya Plastics rose 0.92%, Asia cement rose 1.48%, Taiwanese cement rose 1.67%, and Hon Hai Precision remained unchanged.

Wall Street’s lead indicates a slight negative bias as the leading average rose, then fell in the middle of the session, but then rose to end the mix almost unchanged.

The Dow rose 3.07 points (0.01%) to close at 33,852.53, while the NASDAQ fell 65.72 points (0.59%) to close at 10,983.78, and The S&P 500 fell 6.31 points (0.16%) to 3957.63.

Volatile trading on Wall Street comes amid continued uncertainty about the situation in China following widespread outcry over the country’s Covid restrictions.

Traders may also have been reluctant to make any significant moves ahead of comments from Federal Reserve Chairman Jerome Powell today that could provide further clues about the rate outlook. Unemployment data continues to be released on Friday.

In terms of economic news, the Conference Board released a report showing a moderate decline in US consumer confidence in November.

Crude oil futures ended higher on Tuesday, extending gains from the previous session on hopes that OPEC could cut production to support prices later this week. West Texas intermediate oil futures rose $0.96, or 1.2%, to $78.20 a barrel in January.

Continue Reading

Trending