Internet search engine work – The leak of the source code of the Russian search engine Yandex opens the season for all kinds of interested parties and programmers to dive into its ins and outs and discover something else quite striking.
Some media reported last week that a source code repository for the Yandex search engine had allegedly been stolen by a former employee of the Russian tech company and leaked as a torrent on a popular hacking forum.
We are talking about 44.7 GB of files stolen from the company in July 2022. These code repositories supposedly contain all of the company’s source code, plus anti-spam rules. In a statement, Yandex stated that an initial investigation showed that the leaked code “appears to be old snippets that differ from the current version in the company’s repository.”
Security researcher Arseniy Shestakov claims that the exposed files date back to February 2022, coinciding with the Russian invasion of Ukraine. Although Shestakov said the leaked files included the source code for several services, they did not contain sensitive user data.
As we say, the filtered repository only contains code. The other important part like the model weights for neural networks are missing, so it’s almost useless. Still, there are many interesting files with names like “blacklist.txt” that could expose running services.
Despite this, the movement, in a few words, is unprecedented, and we all lack more data and information about it, but for the first time, you can see the ins and outs of a search engine. This opens up a great space for us to delve into this dark world of Yandex, at least currently, and thus explain to you, if you don’t already know it, what this search engine consists of.
What is Yandex?
Well, let’s go to the base of everything to be able to delve into the issue that concerns us today. Yandex is a Russian technology company known for the creation of its Yandex search engine. According to Statcounter, Yandex had a 39.6% market share in Russia, compared to Google’s 57.9% in July 2020.
Yandex is used to search like other search engines like Google or Bing: you enter the query, hit enter and a bunch of results will appear. According to a study, Yandex generates 52% of web traffic in Russia. Also, it grew in popularity when Russian Android phones decided to stop using Google as the default search engine in 2017.
Mind you, while it works like most conventional search engines, some key differences set Yandex apart from other competitors like Google.
For example, Yandex places more emphasis than Google on local SEO and regionality and performs dependent searches that only show websites from a specific region. This means that people in different places will be shown different results for the same search term.
On the other hand, user behavior as well as dwell time is a key ranking factor for Yandex. Although Google also takes this into account, it is a critical factor for a good ranking in Yandex.
Finally, while link building is still important, it’s more about driving relevant traffic to your site than demonstrating its power or reliability. Domain age and creation date play a bigger role in ranking in Yandex, so you can find a lot of outdated content.
Some conclusions are drawn from Yandex after the leak of its code
We wanted to extract some of the conclusions that have most caught our attention. Of course, if you want to take a look at the article in its entirety, we leave you the linked source of Search Engine.
On the one hand, note that this search engine has anti-SEO upper limits for some ranking factors and 39 of these ranking factors are part of the initially weighted factors that can prevent a page from being included in the initial list of publications.
This is something that many search engines like Bing incorporate. For example, this promotes the abusive use of meta keywords as a negative factor, but it seems that Yandex far exceeds them.
On the other hand, it is suggested that certain parameters benefit more from the reinforcement algorithm than others, which is known as “boosting”.
For example, they mention that the smallest files go in and, what is most striking, Yandex gives a boost that skews its results to certain news organizations and gives a boost to whoever it wants in its positioning.
To help you understand how this is possible, explain that programmers often use specific terms or names so that other developers can understand what function or action a certain line of code performs.
This helps them that if they have to modify or update code, they can reduce the necessary search time. In this case, the Yandex developers appear to have substituted a generic term for a feature with offensive language.
It is not clear why exactly these terms were included. However, the use of offensive language in the code is a violation of good practice and, as Yandex pointed out in its statement, is against its code of ethics.
Yandex did not provide any additional information as to why it has used certain profanities, but those who have delved into it noted that it also appeared to have been used to replace “workers” in various parts of its codebase. Certainly a relic for those who are interested in this and other search engines.