Iterative Viterbi decoding is an algorithm that spots the subsequence S of an observation O = {o1, ..., on} having the highest average probability (i.e., probability scaled by the length of S) of being generated by a given hidden Markov model M with m states. The algorithm uses a modified Viterbi algorithm as an internal step. The scaled probability measure was first proposed by John S. Bridle. An early algorithm to solve this problem, sliding window, was proposed by Jay G. Wilpon et al., 1989, with constant cost T = mn2/2. A faster algorithm consists of an iteration of calls to the Viterbi algorithm, reestimating a filler score until convergence. == The algorithm == A basic (non-optimized) version, finding the sequence s with the smallest normalized distance from some subsequence of t is: // input is placed in observation s[1..n], template t[1..m], // and [[distance matrix]] d[1..n,1..m] // remaining elements in matrices are solely for internal computations (int, int, int) AverageSubmatchDistance(char s[0..(n+1)], char t[0..(m+1)], int d[1..n,0..(m+1)]) { // score, subsequence start, subsequence end declare int e, B, E t'[0] := t'[m+1] := s'[0] := s'[n+1] := 'e' e := random() do e' := e for i := 1 to n do d'[i,0] := d'[i,m+1] := e (e, B, E) := ViterbiDistance(s', t', d') e := e/(E-B+1) until (e == e') return (e, B, E) } The ViterbiDistance() procedure returns the tuple (e, B, E), i.e., the Viterbi score "e" for the match of t and the selected entry (B) and exit (E) points from it. "B" and "E" have to be recorded using a simple modification to Viterbi. A modification that can be applied to CYK tables, proposed by Antoine Rozenknop, consists in subtracting e from all elements of the initial matrix d.
Language model benchmark
A language model benchmark is a standardized test designed to evaluate the performance of language models on various natural language processing tasks. These tests are intended for comparing different models' capabilities in areas such as language understanding, generation, and reasoning. Benchmarks generally consist of a dataset and corresponding evaluation metrics. The dataset provides text samples and annotations, while the metrics measure a model's performance on tasks like answering questions, text classification, and machine translation. These benchmarks are developed and maintained by academic institutions, research organizations, and industry players to track progress in the field. In addition to accuracy, the metrics can include throughput, energy efficiency, bias, trust, and sustainability. == Overview == === Types === Benchmarks may be described by the following adjectives, not mutually exclusive: Classical: These tasks are studied in natural language processing, even before the advent of deep learning. Examples include the Penn Treebank for testing syntactic and semantic parsing, as well as bilingual translation benchmarked by BLEU scores. Question answering: These tasks have a text question and a text answer, often multiple-choice. They can be open-book or closed-book. Open-book QA resembles reading comprehension questions, with relevant passages included as annotation in the question, in which the answer appears. Closed-book QA includes no relevant passages. Closed-book QA is also called open-domain question-answering. Before the era of large language models, open-book QA was more common, and understood as testing information retrieval methods. Closed-book QA became common since GPT-2 as a method to measure knowledge stored within model parameters. Omnibus: An omnibus benchmark combines many benchmarks, often previously published. It is intended as an all-in-one benchmarking solution. Reasoning: These tasks are usually in the question-answering format, but are intended to be more difficult than standard question answering. Multimodal: These tasks require processing not only text, but also other modalities, such as images and sound. Examples include OCR and transcription. Agency: These tasks are for a language-model–based software agent that operates a computer for a user, such as editing images, browsing the web, etc. Adversarial: A benchmark is "adversarial" if the items in the benchmark are picked specifically so that certain models do badly on them. Adversarial benchmarks are often constructed after state of the art (SOTA) models have saturated (achieved 100% performance) a benchmark, to renew the benchmark. A benchmark is "adversarial" only at a certain moment in time, since what is adversarial may cease to be adversarial as newer SOTA models appear. Public/Private: A benchmark might be partly or entirely private, meaning that some or all of the questions are not publicly available. The idea is that if a question is publicly available, then it might be used for training, which would be "training on the test set" and invalidate the result of the benchmark. Usually, only the guardians of the benchmark have access to the private subsets, and to score a model on such a benchmark, one must send the model weights, or provide API access, to the guardians. The boundary between a benchmark and a dataset is not sharp. Generally, a dataset contains three "splits": training, test, and validation. Both the test and validation splits are essentially benchmarks. In general, a benchmark is distinguished from a test/validation dataset in that a benchmark is typically intended to be used to measure the performance of many different models that are not trained specifically for doing well on the benchmark, while a test/validation set is intended to be used to measure the performance of models trained specifically on the corresponding training set. In other words, a benchmark may be thought of as a test/validation set without a corresponding training set. Conversely, certain benchmarks may be used as a training set, such as the English Gigaword or the One Billion Word Benchmark, which in modern language is just the negative log-likelihood loss on a pretraining set with 1 billion words. Indeed, the distinction between benchmark and dataset in language models became sharper after the rise of the pretraining paradigm, whereby a model is first trained on massive, unlabeled datasets to learn general language patterns, syntax, and knowledge (pretraining), and the base model is then adapted to specific, downstream tasks using smaller, labeled datasets (fine-tuning). === Lifecycle === Generally, the life cycle of a benchmark consists of the following steps: Inception: A benchmark is published. It can be simply given as a demonstration of the power of a new model (implicitly) that others then picked up as a benchmark, or as a benchmark that others are encouraged to use (explicitly). Growth: More papers and models use the benchmark, and the performance on the benchmark grows. Maturity, degeneration or deprecation: A benchmark may be saturated, after which researchers move on to other benchmarks. Progress on the benchmark may also be neglected as the field moves to focus on other benchmarks. Renewal: A saturated benchmark can be upgraded to make it no longer saturated, allowing further progress. === Construction === Like datasets, benchmarks are typically constructed by several methods, individually or in combination: Web scraping: Ready-made question-answer pairs may be scraped online, such as from websites that teach mathematics and programming. Conversion: Items may be constructed programmatically from scraped web content, such as by blanking out named entities from sentences, and asking the model to fill in the blank. This was used for making the CNN/Daily Mail Reading Comprehension Task. Crowd sourcing: Items may be constructed by paying people to write them, such as on Amazon Mechanical Turk. This was used for making the MCTest. === Evaluation === Generally, benchmarks are fully automated. This limits the questions that can be asked. For example, with mathematical questions, "proving a claim" would be difficult to automatically check, while "calculate an answer with a unique integer answer" would be automatically checkable. With programming tasks, the answer can generally be checked by running unit tests, with an upper limit on runtime. The benchmark scores are of the following kinds: For multiple choice or cloze questions, common scores are accuracy (frequency of correct answer), precision, recall, F1 score, etc. pass@n: The model is given n {\displaystyle n} attempts to solve each problem. If any attempt is correct, the model earns a point. The pass@n score is the model's average score over all problems. k@n: The model makes n {\displaystyle n} attempts to solve each problem, but only k {\displaystyle k} attempts out of them are selected for submission. If any submission is correct, the model earns a point. The k@n score is the model's average score over all problems. cons@n: The model is given n {\displaystyle n} attempts to solve each problem. If the most common answer is correct, the model earns a point. The cons@n score is the model's average score over all problems. Here "cons" stands for "consensus" or "majority voting". The pass@n score can be estimated more accurately by making N > n {\displaystyle N>n} attempts, and use the unbiased estimator 1 − ( N − c n ) ( N n ) {\displaystyle 1-{\frac {\binom {N-c}{n}}{\binom {N}{n}}}} , where c {\displaystyle c} is the number of correct attempts. For less well-formed tasks, where the output can be any sentence, there are the following commonly used scores including BLEU ROUGE, METEOR, NIST, word error rate, LEPOR, CIDEr, and SPICE. === Issues === error: Some benchmark answers may be wrong. ambiguity: Some benchmark questions may be ambiguously worded. subjective: Some benchmark questions may not have an objective answer at all. This problem generally prevents creative writing benchmarks. Similarly, this prevents benchmarking writing proofs in natural language, though benchmarking proofs in a formal language is possible. open-ended: Some benchmark questions may not have a single answer of a fixed size. This problem generally prevents programming benchmarks from using more natural tasks such as "write a program for X", and instead uses tasks such as "write a function that implements specification X". inter-annotator agreement: Some benchmark questions may be not fully objective, such that even people would not agree with 100% on what the answer should be. This is common in natural language processing tasks, such as syntactic annotation. shortcut: Some benchmark questions may be easily solved by an "unintended" shortcut. For example, in the SNLI benchmark, having a negative word like "not" in the second sentence is a strong signal for the "Contradiction" category, regardless of what the se
Conservative morphological anti-aliasing
Conservative morphological anti-aliasing (CMAA) is an antialiasing technique originally developed by Filip Strugar at Intel. CMAA is an image-based, post processing technique similar to that of morphological antialiasing. CMAA uses 4 main steps which are image analysis for color discontinuities, locally dominant edge detection, simple shape handling, and lastly symmetrical long edge shape handling. A couple of years after CMAA was introduced, Intel unveiled an updated version which they named CMAA2.
SEMAT
SEMAT (Software Engineering Method and Theory) is an initiative to reshape software engineering such that software engineering qualifies as a rigorous discipline. The initiative was launched in December 2009 by Ivar Jacobson, Bertrand Meyer, and Richard Soley with a call for action statement and a vision statement. The initiative was envisioned as a multi-year effort for bridging the gap between the developer community and the academic community and for creating a community giving value to the whole software community. The work is now structured in four different but strongly related areas: Practice, Education, Theory, and Community. The Practice area primarily addresses practices. The Education area is concerned with all issues related to training for both the developers and the academics including students. The Theory area is primarily addressing the search for a General Theory in Software Engineering. Finally, the Community area works with setting up legal entities, creating websites and community growth. It was expected that the Practice area, the Education area and the Theory area would at some point in time integrate in a way of value to all of them: the Practice area would be a "customer" of the Theory area, and direct the research to useful results for the developer community. The Theory area would give a solid and practical platform for the Practice area. And, the Education area would communicate the results in proper ways. == Practice area == The first step was here to develop a common ground or a kernel including the essence of software engineering – things we always have, always do, always produce when developing software. The second step was envisioned to add value on top of this kernel in the form of a library of practices to be composed to become specific methods, specific for all kinds of reasons such as the preferences of the team using it, kind of software being built, etc. The first step is as of this writing just about to be concluded. The results are a kernel including universal elements for software development – called the Essence Kernel, and a language – called the Essence Language - to describe these elements (and elements built on top of the kernel (practices, methods, and more). Essence, including both the kernel and language, has been published as an OMG standard in beta status in July 2013 and is expected to become a formally adopted standard in early 2014. The second step has just started, and the Practice area will be divided into a number of separate but interconnected tracks: the practice (library track), the tool track are so far identified and work has started or is about to get started. The practice track is currently working on a Users Guide. == Education area == The area focuses on leveraging the work of SEMAT in software engineering education, both within academia and industry. It promotes global education based on a common ground called Essence. The area's target groups are instructors such as university professors and industrial coaches as well as their students and learning practitioners. The goal of the area is to create educational courses and course materials that are internationally viable, identify pedagogical approaches that are appropriate and effective for specific target groups and disseminate experience and lessons learned. The area includes members from a number of universities and institutes worldwide. Most members have already been involved in leveraging aspects of SEMAT in the context of their software engineering courses. They are gathering their resources and starting a common venture towards defining a new generation of SEMAT-powered software engineering curricula. As of 2018, some studies of utilizing Essence in educational settings exist. One example of the use of Essence in university education was a software engineering course carried out in Norwegian University of Science and Technology. A study was conducted by introducing Essence into a project-based software engineering course, with the aim of understanding what difficulties the students faced in using Essence, and whether they considered it to have been useful. The results indicated that Essence could also be useful for novice software engineers by (1) encouraging them to look up and study new practices and methods in order to create their own, (2) encouraging them to adjust their way-of-working reflectively and in a situation-specific manner, (3) helping them structure their way of working. The findings of another study introducing students to Essence through a digital game supported these findings: the students felt that Essence will be useful to them in future, real-world projects, and that they wish to utilize it in them. == Theory area == An important part of SEMAT is that a general theory of software engineering is planned to emerge with significant benefits. A series of workshops held under the title SEMAT Workshop on a General Theory of Software Engineering (GTSE) are a key component in awareness building around general theories. In addition to community awareness building, SEMAT also aims to contribute with a specific general theory of software engineering. This theory should be solidly based on the SEMAT Essence language and kernel, and should support software engineering practitioners' goal-oriented decision making. As argued elsewhere, such support is predicated on the predictive capabilities of the theory. Thus, the SEMAT Essence should be augmented to allow the prediction of critical software engineering phenomena. The GTSE workshop series assists in the development of the SEMAT general software engineering theory by engaging a larger community in the search for, development of, and evaluation of promising theories, which may be used as a base for the SEMAT theory. == Organizational structure == === Main organization === SEMAT is chaired by Sumeet S. Malhotra of Tata Consultancy Services. The CEO of the organization is Ste Nadin of Fujitsu. The Executive Management Committee of SEMAT are Ivar Jacobson, Ste Nadin, Sumeet S. Malhotra, Paul E. McMahon, Michael Goedicke and Cecile Peraire. === Japan Chapter === Japan Chapter was established in April 2013, and it has more than 250 members as of November 2013. Member activities include carrying out seminars about SEMAT, considering utilization of SEMAT Essence for integrating different requirements engineering techniques and body of knowledges (BoKs), and translating articles into Japanese. === Korea Chapter === The chapter was inaugurated with about 50 members in October 2013. Member activities include: 2e Consulting started rewriting their IT service engagement methods using the Essence kernel, and uEngine Solutions started developing a tool to orchestrate Essence-kernel based practices into a project method. Korean government supported KAIST to conduct research in Essence. === Latin American Chapter === Semat Latin American Chapter was created in August 2011 in Medellin (Colombia) by Ivar Jacobson during the Latin American Software Engineering Symposium. This Chapter has 9 Executive Committee members from Colombia, Venezuela, Peru, Brazil, Argentina, Chile, and Mexico, chaired by Dr. Carlos Zapata from Colombia. More than 80 people signed the initial declaration of the Chapter and nowadays the Chapter members are in charge of disseminating the Semat ideas in all Latin America. Chapter members have participated in various Latin American conferences, including the Latin American Conference on Informatics (CLEI), the Ibero American Software Engineering and Knowledge Engineering Journeys (JIISIC), the Colombian Computing Conference (CCC), and the Chilean Computing Meeting (ECC). The Chapter contributed in the submission sent in response to the OMG call for proposals and currently studies didactic strategies for teaching the Semat kernel by games, theoretical studies about some kernel elements, and practical representations of several software development and quality methods by using the Semat kernel. Some of the members also translated the Essence book and some other Semat materials and papers into Spanish. === Russia Chapter === Russian Chapter has about 20 members. A few universities have incorporated SEMAT in their training courses , including Moscow State University, Moscow Institute of Physics and Technology, Higher School of Economics, Moscow State University of Economics, Statistics, and Informatics. The chapter and some commercial companies are carrying out seminars about SEMAT. INCOSE Russian Chapter is working on an extension of SEMAT to systems engineering. EC-leasing is working on an extension of the Kernel for Software Life Cycle. Russian Chapter attended in two conferences: Actual Problems of System and Software Engineering and SECR with SEMAT section and articles. Translation of the Essence book into Russian is in progress. == Practical Applications of SEMAT == Ideas developed by the SEMAT community have been applied by both industry and ac
Web Dynpro
Web Dynpro (WD) is a web application technology developed by SAP SE that focuses on the development of server-side business applications. For modern releases (for instance as of NetWeaver 750, software layer SAP_UI) the user interface is rendered according to the HTML5 web standard. Since Netweaver 754 (software layer SAP_UI, ABAP Platform 1909) a touch enabled user interface is available. The newly released versions usually follow the SAP Fiori design principles. One of its main design features is that the user interface is defined in an entirely declarative manner. Web Dynpro applications can be developed using either the Java (Web Dynpro for Java, WDJ or WD4J) or ABAP (Web Dynpro ABAP, WDA or WD4A) development infrastructure. == Overview == The earliest version of Web Dynpro appeared in 2003 and was based on Java. This variant was released approximately 18 months before the ABAP variant. As of 2010, the Java variant of Web Dynpro was put into maintenance mode. WD follows a design architecture based on an interpretation of the MVC design pattern and uses a model driven development approach ("minimize coding, maximize design"). The Web Dynpro Framework is a server-side runtime environment into which many dedicated "hook methods" are available. The developer then places their own custom coding within these hook methods in order to implement the desired business functionality. These hook methods belong to one of the broad categories of either "life-cycle" and "round-trip"; that is, those methods that are concerned with the life-cycle of a software component (i.e. processing that takes place at start up and shut down etc.), and those methods that are concerned with processing the fixed sequence of events that take place during a client-initiated round trip to the server. Web Dynpro is aimed at the development of business applications that follow standardized UI principles, applications that connect to backend systems and which are scalable. Key Capabilities Declarative way of development: Web Dynpro offers a graphical and declarative means of UI development. UI controls, building blocks, views and windows are modeled, and the business logic can be coded separately. Separation of user interface and business logic: One advantage of Web Dynpro over SAP GUI is the separation between business logic and user interface, and the structured development process with less implementation effort. Support of stateful application: The state of the application is kept in the back-end. This leads to a reduced data transfer from ABAP server to browser and vice versa. Regarding Web Dynpro ABAP there is only one programming language (ABAP) and only one system necessary. Therefore, development can be easier and cost efficient.
Absher (application)
Absher (Arabic: أبشر ‘Absher, roughly meaning "good tidings" or "yes, done") is a smartphone application and web portal which allows citizens and residents of Saudi Arabia to use a variety of governmental services. Amongst several other services with the Absher app, it can be used to apply for jobs and Hajj permits, passport info can be updated, and electronic crimes can be reported. The application provides around 280 services for residents of Saudi Arabia including but not limited to making appointments, renewing passports, residents' cards, IDs, driver's licenses and others, and, controversially, enables Saudi men to track the whereabouts of women they control as part of the country's male guardianship system. The app can be downloaded from the Google Play Store and Apple App Store and is supplied by the Saudi Interior Ministry. According to the Ministry of the Interior, Absher has more than 20 million users. As of February 2019, Absher has been downloaded 4.2 million times from the App Store. Some services provided through Absher can also be accessed through the website absher.sa. In March 2021, Saudi Arabia launched the digital version of the Absher for individuals app through which the users can download a copy of their digital ID. Then, new services were added to the platform such as online birth and death registration services, requesting amendments to academic credentials, correcting names in English and marital status and requesting civil records of children. == Impact on women's rights == The app has been criticized by various human rights activists, human rights organisations and international communities. The US and European countries have also condemned the app and urged the kingdom to end its male guardianship system. Absher gained media attention in 2019 for its functions supporting the Saudi policy of male guardianship following an investigation by Business Insider. The app allows for designated guardians to receive notifications if a woman under their guardianship passes through an airport and subsequently gives them the option to withdraw her right to travel. In a few cases, this system has been circumvented by women who have been able to gain control over its settings and use it to allow themselves to travel. US Senator Ron Wyden of Oregon wrote a letter to the CEO's of Apple and Google, criticizing the app and demanding for its removal immediately. Wyden said "American companies should not enable or facilitate the Saudi government's patriarchy," and called the Saudi system of control over women "abhorrent". According to the EU lawmakers, current rules imposed over the women by the Saudi government make women “second-class citizens”. The lawmakers also asked the EU states to continue to build pressure on Riyadh so as to improve the conditions of women and human rights. Amnesty International and Human Rights Watch accused Apple and Google of helping "enforce gender apartheid" by hosting the app. US congresswomen Rep. Katherine Clark and Rep. Carolyn B. Maloney condemned the kingdom's male guardianship system that reflected from the app, calling Absher a "patriarchal weapon" and asking for its removal. In response to the criticism received by Absher, Apple chief executive officer Tim Cook stated in February 2019 that he intended to investigate the situation. Similarly, Google announced that it would also review the application. After a prompt review, Google declined to remove the app from Google Play, citing that it did not violate the agreed upon terms and conditions of the store. Saudi doctor Khawla Al-Kuraya supported this app an editorial in Bloomberg News. Kuraya wrote that Absher helped Saudi women avoid governmental bureaucracy as it allows their male guardians to process their travel permits anywhere and anytime through Absher. Although she believes that the guardianship system needs to be reconsidered, she thinks that Absher is an important step towards facilitating women-guardians related issues in Saudi Arabia. Absher manager Atiyah Al-Anazy announced in 2019 that two million women were using the application in Saudi Arabia to facilitate their transactions. Some female users stated that the application has made their movement and travel-related issues easier. New measures were introduced that year to allow Saudi women above the age of 18 to travel without their male guardians, which ultimately released male authoritative rights on women. A law was subsequently passed allowing women over the age of 21 to receive a passport and travel without prior male permission.
Pocket (service)
Pocket, formerly known as Read It Later, was a social bookmarking service for storing, sharing and discovering web bookmarks, first released in 2007. Mozilla, the developer of Pocket, announced in May 2025 that it was discontinuing the service and would shut it down in July of that year. == History == Pocket was introduced in August 2007 as a Mozilla Firefox browser extension named Read It Later by Nathan (Nate) Weiner. Once his product was used by millions of people, he moved his office to Silicon Valley and four other people joined the Read It Later team. Weiner's intention was for the application to be like a TiVo directory for web content and to give users access to that content on any device. Read It Later obtained venture capital investments of US$2.5 million in 2011 and $5.0 million in 2012. The 2011 funding came from Foundation Capital, Baseline Ventures, Google Ventures, Founder Collective and unnamed angel investors. The company rejected an acquisition offer by Evernote after showing concerns that Evernote intended to shut down the Read It Later service and amalgamate its functionality into Evernote's main service. Initially, the Read It Later app was available in a free version and a paid version that included additional features. After the rebranding to Pocket, all paid features were made available in a free and advertisement-free app. In May 2014, a paid subscription service called Pocket Premium was introduced, adding server-side storage of articles and more powerful search tools. In June 2015, Pocket was included in Firefox, via a toolbar button and link to a user's Pocket list in the bookmark's menu. The integration was controversial, as users displayed concerns for the direct integration of a proprietary service into an open source application, and that it could not be completely disabled without editing advanced settings, unlike other third-party extensions. A Mozilla spokesperson stated that the feature was meant to leverage the service's popularity among Firefox users and clarified that all code related to the integration was open source. The spokesperson added that "[Mozilla had] gotten lots of positive feedback about the integration from users". On February 27, 2017, Pocket announced that it had been acquired by Mozilla Corporation, the commercial arm of Firefox's non-profit development group. Mozilla staff stated that Pocket would continue to operate as an independent subsidiary but that it would be leveraged as part of an ongoing "Context Graph" project. There were plans to open-source the server-side code of Pocket, though only parts of the project had been open-sourced as of 2024. On May 22, 2025, Mozilla announced that it would shut down Pocket on July 8, 2025. Exports of user data would be available until October 8, 2025, when accounts would be deleted. The email newsletter Pocket Hits was rebranded as Ten Tabs on June 12 as part of the closure, with it being changed to release only on weekdays. == Functions == The application allows the user to save an article or web page to remote servers for later reading. The article is sent to the user's Pocket list (synced to all of their devices) for offline reading. Pocket makes the article more readable by removing clutter and enabling the user to add tags and adjust text settings. == User base == The application had 17 million users and 1 billion saves, as of September 2015. Pocket was listed among Time magazine's 50 Best Android Applications for 2013. == Reception == Kent German of CNET said that "Read It Later is oh so incredibly useful for saving all the articles and news stories I find while commuting or waiting in line." Erez Zukerman of PC World said that supporting the developer is enough reason to buy what he deemed a "handy app". Bill Barol of Forbes said that although Read It Later works less well than Instapaper, "it makes my beloved Instapaper look and feel a little stodgy." In 2015, Pocket was awarded a Material Design Award for Adaptive Layout by Google for their Android application.