Master data management

Master data management

Master data management (MDM) is a discipline in which business and information technology collaborate to ensure the uniformity, accuracy, stewardship, semantic consistency, and accountability of the enterprise's official shared master data assets. == Reasons for master data management == Data consistency and accuracy: MDM ensures that the organization's critical data is consistent and accurate across all systems, reducing discrepancies and errors caused by multiple, siloed copies of the same data. Improved decision-making: By providing a single version of the truth (SVOT), MDM enables organizations to deliver the right data to decision makers, allowing them to clearly understand business performance and make informed, data-driven decisions. Operational efficiency: With the consistent and accurate data provided by an MDM, operational processes such as reporting and inventory management can be automated to improve efficiency. Employee learning, onboarding, and customer service also become more efficient, as MDM data facilitates rapid, accurate, and thorough information retrieval, permitting more employee time to be spent on work. Regulatory compliance: MDM tries to help organizations comply with industry standards and regulations by ensuring that master data is accurately recorded, maintained, and audited. However, issues with data quality, classification, and reconciliation may require data transformation. As with other Extract, Transform, Load-based data movements, these processes are expensive and inefficient, reducing return on investment for a project. == Business unit and product line segmentation == As a result of business unit and product line segmentation, the same entity (whether a customer, supplier, or product) will be included in different product lines. This leads to data redundancy and even confusion. For example, a customer takes out a mortgage at a bank. If the marketing and customer service departments have separate databases, advertisements might still be sent to the customer, even though they've already signed up. The two parts of the bank are unaware, and the customer is sent irrelevant communications. Record linkage can associate different records corresponding to the same entity, mitigating this issue. == Mergers and acquisitions == One of the most common problems for master data management is company growth through mergers or acquisitions. Reconciling these separate master data systems can present difficulties, as existing applications have dependencies on the master databases. Ideally, database administrators resolve this problem through deduplication of the master data as part of the merger. Over time, as further mergers and acquisitions occur, the problem can multiply. Data reconciliation processes can become extremely complex or even unreliable. Some organizations end up with 10, 15, or even 100 separate and poorly integrated master databases. This can cause serious problems in customer satisfaction, operational efficiency, decision support, and regulatory compliance. Another problem involves determining the proper degrees of detail and normalization to include in the master data schema. For example, in a federated Human Resources environment, the enterprise software may focus on storing people's data as current status, adding a few fields to identify the date of hire, date of last promotion, etc. However, this simplification can introduce business-impacting errors into dependent systems for planning and forecasting. The stakeholders of such systems may be forced to build a parallel network of new interfaces to track the onboarding of new hires, planned retirements, and divestment, which works against one of the aims of master data management. == People, processes and technology == Master data management is enabled by technology, but is more than the technologies that enable it. An organization's master data management capability will also include people and processes in its definition. === People === Several roles should be staffed within MDM. Most prominently, the Data Owner and the Data Steward. Several people would likely be allocated to each role and each person responsible for a subset of Master Data (e.g. one data owner for employee master data, another for customer master data). The Data Owner is responsible for the requirements for data definition, data quality, data security, etc. as well as for compliance with data governance and data management procedures. The Data Owner should also be funding improvement projects in case of deviations from the requirements. The Data Steward is running the master data management on behalf of the data owner and probably also being an advisor to the Data Owner. === Processes === Master data management can be viewed as a "discipline for specialized quality improvement" defined by the policies and procedures put in place by a data governance organization. It has the objective of providing processes for collecting, aggregating, matching, consolidating, quality-assuring, persisting and distributing master data throughout an organization to ensure a common understanding, consistency, accuracy and control, in the ongoing maintenance and application use of that data. Processes commonly seen in master data management include source identification, data collection, data transformation, normalization, rule administration, error detection and correction, data consolidation, data storage, data distribution, data classification, taxonomy services, item master creation, schema mapping, product codification, data enrichment, hierarchy management, business semantics management and data governance. === Technology === A master data management tool can be used to support master data management by removing duplicates, standardizing data (mass maintaining), and incorporating rules to eliminate incorrect data from entering the system to create an authoritative source of master data. Master data are the products, accounts, and parties for which the business transactions are completed. Where the technology approach produces a "golden record" or relies on a "source of record" or "system of record", it is common to talk of where the data is "mastered". This is accepted terminology in the information technology industry, but care should be taken, both with specialists and with the wider stakeholder community, to avoid confusing the concept of "master data" with that of "mastering data". ==== Implementation models ==== There are several models for implementing a technology solution for master data management. These depend on an organization's core business, its corporate structure, and its goals. These include: Source of record Registry Consolidation Coexistence Transaction/centralized ===== Source of record ===== This model identifies a single application, database, or simpler source (e.g. a spreadsheet) as being the "source of record" (or "system of record" where solely application databases are relied on). The benefit of this model is its conceptual simplicity, but it may not fit with the realities of complex master data distribution in large organizations. The source of record can be federated, for example by groups of attributes (so that different attributes of a master data entity may have different sources of record) or geographically (so that different parts of an organization may have different master sources). Federation is only applicable in certain use cases, where there is a clear delineation of which subsets of records will be found in which sources. The source of record model can be applied more widely than simply to master data, for example to reference data. ==== Transmission of master data ==== There are several ways in which master data may be collated and distributed to other systems. This includes: Data consolidation – The process of capturing master data from multiple sources and integrating it into a single hub (operational data store) for replication to other destination systems. Data federation – The process of providing a single virtual view of master data from one or more sources to one or more destination systems. Data propagation – The process of copying master data from one system to another, typically through point-to-point interfaces in legacy systems. == Change management in implementation == Challenges in adopting master data management within large organizations often arise when stakeholders disagree on a "single version of the truth" concept is not affirmed by stakeholders, who believe that their local definition of the master data is necessary. For example, the product hierarchy used to manage inventory may be entirely different from the product hierarchies used to support marketing efforts or pay sales representatives. It is above all necessary to identify if different master data is genuinely required. If it is required, then the solution implemented (technology and process) must be able to allow multiple versions of the truth to exist but will prov

Kdan Mobile

Kdan Mobile Software Limited is a software application development company based in Tainan City, Taiwan. Kdan also has branches in Taipei, Changsha, Irvine, California, Japan, and South Korea. The company was founded in 2009 by Kenny Su, the company's CEO. == History == Kdan Mobile was founded in 2009 by Kenny Su (蘇柏州) and develops an application for PDF documents. Su previously worked at the Industrial Technology Research Institute (ITRI) . In 2018, the company completed its Series B round of fundraising, in which it raised 16 million USD in total. Four global firms, Dattoz Partners (South Korea), WI Harper Group (U.S.), Taiwania Capital (Taiwan), and Golden Asia Fund Mitsubishi UFJ Capital (Japan), made up the Series B investment. Kdan previously raised 5 million USD in its Series A round in 2018.

Mata v. Avianca, Inc.

Mata v. Avianca, Inc. was a U.S. District Court for the Southern District of New York case in which the Court dismissed a personal injury case against the airline Avianca and issued a $5,000 fine to the plaintiffs' lawyers who had submitted fake precedents generated by ChatGPT in their legal briefs. == Background == In February 2022, Roberto Mata filed a personal injury lawsuit in the U.S. District Court for the Southern District of New York against Avianca, alleging that he was injured when a metal serving cart struck his knee during an international flight. The plaintiff's lawyers used ChatGPT to generate a legal motion, which contained numerous fake legal cases involving fictitious airlines with fabricated quotations and internal citations. Avianca's lawyers notified the Court that they had been "unable to locate" a few legal cases cited in the legal motion. The Court could not locate the cases either and ordered the plaintiff's lawyers to provide copies of the cited legal cases. Mata's lawyers provided copies of documents purportedly containing all but one of the legal cases, after ChatGPT assured that the cases "indeed exist" and "can be found in reputable legal databases such as LexisNexis and Westlaw." == Opinion == In May 2023, Judge P. Kevin Castel dismissed the personal injury case against Avianca and ordered the plaintiff's attorneys to pay a $5,000 fine. Judge Castel noted numerous inconsistencies in the opinion summaries, describing one of the legal analyses as "gibberish." Judge Castel held that Mata's lawyers had acted with "subjective bad faith" sufficient for sanctions under Federal Rule of Civil Procedure Rule 11. == Impact == In July 2024, the American Bar Association issued its first formal ethics opinion on the responsibilities of lawyers using generative AI (GAI). The 15-page opinion outlines how the Rules of Professional Conduct apply to the use of GAI in the practice of law. Experts caution that lawyers cannot reasonably rely on the accuracy, completeness, or validity of content generated by GAI tools. Due to the continued usage of GAI in the practice of law, Mata has been described as a landmark case by legal professionals, as it is frequently cited by courts in cases where usage of GAI during the course of proceedings leads to the creation and citation of nonexistent caselaw.

T-norm

In mathematics, a t-norm (also T-norm or, unabbreviated, triangular norm) is a kind of binary operation used in the framework of probabilistic metric spaces and in multi-valued logic, specifically in fuzzy logic. A t-norm generalizes intersection in a lattice and conjunction in logic. The name triangular norm refers to the fact that in the framework of probabilistic metric spaces t-norms are used to generalize the triangle inequality of ordinary metric spaces. == Definition == A t-norm is a function T: [0, 1] × [0, 1] → [0, 1] that satisfies the following properties: Commutativity: T(a, b) = T(b, a) Monotonicity: T(a, b) ≤ T(c, d) if a ≤ c and b ≤ d Associativity: T(a, T(b, c)) = T(T(a, b), c) The number 1 acts as identity element: T(a, 1) = a Since a t-norm is a binary algebraic operation on the interval [0, 1], infix algebraic notation is also common, with the t-norm usually denoted by ∗ {\displaystyle } . The defining conditions of the t-norm are exactly those of a partially ordered abelian monoid on the real unit interval [0, 1]. (Cf. ordered group.) The monoidal operation of any partially ordered abelian monoid L is therefore by some authors called a triangular norm on L. === Classification of t-norms === A t-norm is called continuous if it is continuous as a function, in the usual interval topology on [0, 1]2. (Similarly for left- and right-continuity.) A t-norm is called strict if it is continuous and strictly monotone. A t-norm is called nilpotent if it is continuous and each x in the open interval (0, 1) is nilpotent, that is, there is a natural number n such that x ∗ {\displaystyle } ... ∗ {\displaystyle } x (n times) equals 0. A t-norm ∗ {\displaystyle } is called Archimedean if it has the Archimedean property, that is, if for each x, y in the open interval (0, 1) there is a natural number n such that x ∗ {\displaystyle } ... ∗ {\displaystyle } x (n times) is less than or equal to y. The usual partial ordering of t-norms is pointwise, that is, T1 ≤ T2 if T1(a, b) ≤ T2(a, b) for all a, b in [0, 1]. As functions, pointwise larger t-norms are sometimes called stronger than those pointwise smaller. In the semantics of t-norm fuzzy logics, however, the larger a t-norm, the weaker (in terms of logical strength) conjunction it represents. == Prominent examples == Minimum t-norm ⊤ m i n ( a , b ) = min { a , b } , {\displaystyle \top _{\mathrm {min} }(a,b)=\min\{a,b\},} also called the Gödel t-norm, as it is the standard semantics for conjunction in Gödel fuzzy logic. Besides that, it occurs in most t-norm based fuzzy logics as the standard semantics for weak conjunction. It is the pointwise largest t-norm (see the properties of t-norms below). Product t-norm ⊤ p r o d ( a , b ) = a ⋅ b {\displaystyle \top _{\mathrm {prod} }(a,b)=a\cdot b} (the ordinary product of real numbers). Besides other uses, the product t-norm is the standard semantics for strong conjunction in product fuzzy logic. It is a strict Archimedean t-norm. Łukasiewicz t-norm ⊤ L u k ( a , b ) = max { 0 , a + b − 1 } . {\displaystyle \top _{\mathrm {Luk} }(a,b)=\max\{0,a+b-1\}.} The name comes from the fact that the t-norm is the standard semantics for strong conjunction in Łukasiewicz fuzzy logic. It is a nilpotent Archimedean t-norm, pointwise smaller than the product t-norm. Drastic t-norm ⊤ D ( a , b ) = { b if a = 1 a if b = 1 0 otherwise. {\displaystyle \top _{\mathrm {D} }(a,b)={\begin{cases}b&{\mbox{if }}a=1\\a&{\mbox{if }}b=1\\0&{\mbox{otherwise.}}\end{cases}}} The name reflects the fact that the drastic t-norm is the pointwise smallest t-norm (see the properties of t-norms below). It is a right-continuous Archimedean t-norm. Nilpotent minimum ⊤ n M ( a , b ) = { min ( a , b ) if a + b > 1 0 otherwise {\displaystyle \top _{\mathrm {nM} }(a,b)={\begin{cases}\min(a,b)&{\mbox{if }}a+b>1\\0&{\mbox{otherwise}}\end{cases}}} is a standard example of a t-norm that is left-continuous, but not continuous. Despite its name, the nilpotent minimum is not a nilpotent t-norm. Hamacher product ⊤ H 0 ( a , b ) = { 0 if a = b = 0 a b a + b − a b otherwise {\displaystyle \top _{\mathrm {H} _{0}}(a,b)={\begin{cases}0&{\mbox{if }}a=b=0\\{\frac {ab}{a+b-ab}}&{\mbox{otherwise}}\end{cases}}} is a strict Archimedean t-norm, and an important representative of the parametric classes of Hamacher t-norms and Schweizer–Sklar t-norms. == Properties of t-norms == The drastic t-norm is the pointwise smallest t-norm and the minimum is the pointwise largest t-norm: ⊤ D ( a , b ) ≤ ⊤ ( a , b ) ≤ ⊤ m i n ( a , b ) , {\displaystyle \top _{\mathrm {D} }(a,b)\leq \top (a,b)\leq \mathrm {\top _{min}} (a,b),} for any t-norm ⊤ {\displaystyle \top } and all a, b in [0, 1]. In particular, we have that: ⊤ D ( a , b ) ≤ ⊤ L u k ( a , b ) ≤ ⊤ p r o d ( a , b ) ≤ ⊤ m i n ( a , b ) , {\displaystyle \top _{\mathrm {D} }(a,b)\leq \top _{\mathrm {Luk} }(a,b)\leq \top _{\mathrm {prod} }(a,b)\leq \mathrm {\top _{min}} (a,b),} for all a, b in [0, 1]. For every t-norm T, the number 0 acts as null element: T(a, 0) = 0 for all a in [0, 1]. A t-norm T has zero divisors if and only if it has nilpotent elements; each nilpotent element of T is also a zero divisor of T. The set of all nilpotent elements is an interval [0, a] or [0, a), for some a in [0, 1]. === Properties of continuous t-norms === Although real functions of two variables can be continuous in each variable without being continuous on [0, 1]2, this is not the case with t-norms: a t-norm T is continuous if and only if it is continuous in one variable, i.e., if and only if the functions fy(x) = T(x, y) are continuous for each y in [0, 1]. Analogous theorems hold for left- and right-continuity of a t-norm. A continuous t-norm is Archimedean if and only if 0 and 1 are its only idempotents. A continuous Archimedean t-norm is strict if 0 is its only nilpotent element; otherwise it is nilpotent. By definition, moreover, a continuous Archimedean t-norm T is nilpotent if and only if each x < 1 is a nilpotent element of T. Thus with a continuous Archimedean t-norm T, either all or none of the elements of (0, 1) are nilpotent. If it is the case that all elements in (0, 1) are nilpotent, then the t-norm is isomorphic to the Łukasiewicz t-norm; i.e., there is a strictly increasing function f such that ⊤ ( x , y ) = f − 1 ( ⊤ L u k ( f ( x ) , f ( y ) ) ) . {\displaystyle \top (x,y)=f^{-1}(\top _{\mathrm {Luk} }(f(x),f(y))).} If on the other hand it is the case that there are no nilpotent elements of T, the t-norm is isomorphic to the product t-norm. In other words, all nilpotent t-norms are isomorphic, the Łukasiewicz t-norm being their prototypical representative; and all strict t-norms are isomorphic, with the product t-norm as their prototypical example. The Łukasiewicz t-norm is itself isomorphic to the product t-norm undercut at 0.25, i.e., to the function p(x, y) = max(0.25, x ⋅ y) on [0.25, 1]2. For each continuous t-norm, the set of its idempotents is a closed subset of [0, 1]. Its complement—the set of all elements that are not idempotent—is therefore a union of countably many non-overlapping open intervals. The restriction of the t-norm to any of these intervals (including its endpoints) is Archimedean, and thus isomorphic either to the Łukasiewicz t-norm or the product t-norm. For such x, y that do not fall into the same open interval of non-idempotents, the t-norm evaluates to the minimum of x and y. These conditions actually give a characterization of continuous t-norms, called the Mostert–Shields theorem, since every continuous t-norm can in this way be decomposed, and the described construction always yields a continuous t-norm. The theorem can also be formulated as follows: A t-norm is continuous if and only if it is isomorphic to an ordinal sum of the minimum, Łukasiewicz, and product t-norm. A similar characterization theorem for non-continuous t-norms is not known (not even for left-continuous ones), only some non-exhaustive methods for the construction of t-norms have been found. == Residuum == For any left-continuous t-norm ⊤ {\displaystyle \top } , there is a unique binary operation ⇒ {\displaystyle \Rightarrow } on [0, 1] such that ⊤ ( z , x ) ≤ y {\displaystyle \top (z,x)\leq y} if and only if z ≤ ( x ⇒ y ) {\displaystyle z\leq (x\Rightarrow y)} for all x, y, z in [0, 1]. This operation is called the residuum of the t-norm. In prefix notation, the residuum of a t-norm ⊤ {\displaystyle \top } is often denoted by ⊤ → {\displaystyle {\vec {\top }}} or by the letter R. The interval [0, 1] equipped with a t-norm and its residuum forms a residuated lattice. The relation between a t-norm T and its residuum R is an instance of adjunction (specifically, a Galois connection): the residuum forms a right adjoint R(x, –) to the functor T(–, x) for each x in the lattice [0, 1] taken as a poset category. In the standard semantics of t-norm based fuzzy logics, where conjunction is interpreted by a t-norm, the residuum plays the role of implication (often

Orion's Arm

The Orion's Arm Universe Project (OA) is a multi-authored online hard science fiction world-building project, first established in 2000 by M. Alan Kazlev, Donna Malcolm Hirsekorn, Bernd Helfert and Anders Sandberg and further co-authored by many people since. Anyone can contribute articles, stories, artwork, or music to the website. The first published Orion's Arm book, a collection of five novellas set within the OA universe, called Against a Diamond Sky, was released in September 2009. == Canon == The fictional setting of Orion's Arm takes place about 10,000 years in the future, where an interstellar civilization spread across thousands of light-years, with inhabited planets and space habitats. Its inhabitants range from humans to extensively modified human beings, including superhumans with advanced augmentations and internal AI systems, while most people exist as softwares. Engineered wormholes are used for interstellar travel and transport, although not for time travel. The setting also includes several alien civilizations and evidence of more advanced alien societies in the past. At its highest levels, directed human evolution has produced vast godlike beings linked across interstellar distances, capable of understanding and creating technologies beyond ordinary minds. == Reception == Orion's Arm has been reviewed in the role-playing magazine Knights of the Dinner Table, as well as on Boing Boing by transhumanist science fiction author Cory Doctorow. References to the Encyclopaedia Galactica have been made in a book on overcoming Librarian stereotypes. The Orion's Arm website has also been recommended in a children's teaching guide.

Similarity learning

Similarity learning is an area of supervised machine learning in artificial intelligence. It is closely related to regression and classification, but the goal is to learn a similarity function that measures how similar or related two objects are. It has applications in ranking, in recommendation systems, visual identity tracking, face verification, and speaker verification. == Learning setup == There are four common setups for similarity and metric distance learning. Regression similarity learning In this setup, pairs of objects are given ( x i 1 , x i 2 ) {\displaystyle (x_{i}^{1},x_{i}^{2})} together with a measure of their similarity y i ∈ R {\displaystyle y_{i}\in R} . The goal is to learn a function that approximates f ( x i 1 , x i 2 ) ∼ y i {\displaystyle f(x_{i}^{1},x_{i}^{2})\sim y_{i}} for every new labeled triplet example ( x i 1 , x i 2 , y i ) {\displaystyle (x_{i}^{1},x_{i}^{2},y_{i})} . This is typically achieved by minimizing a regularized loss min W ∑ i l o s s ( w ; x i 1 , x i 2 , y i ) + r e g ( w ) {\displaystyle \min _{W}\sum _{i}loss(w;x_{i}^{1},x_{i}^{2},y_{i})+reg(w)} . Classification similarity learning Given are pairs of similar objects ( x i , x i + ) {\displaystyle (x_{i},x_{i}^{+})} and non similar objects ( x i , x i − ) {\displaystyle (x_{i},x_{i}^{-})} . An equivalent formulation is that every pair ( x i 1 , x i 2 ) {\displaystyle (x_{i}^{1},x_{i}^{2})} is given together with a binary label y i ∈ { 0 , 1 } {\displaystyle y_{i}\in \{0,1\}} that determines if the two objects are similar or not. The goal is again to learn a classifier that can decide if a new pair of objects is similar or not. Ranking similarity learning Given are triplets of objects ( x i , x i + , x i − ) {\displaystyle (x_{i},x_{i}^{+},x_{i}^{-})} whose relative similarity obey a predefined order: x i {\displaystyle x_{i}} is known to be more similar to x i + {\displaystyle x_{i}^{+}} than to x i − {\displaystyle x_{i}^{-}} . The goal is to learn a function f {\displaystyle f} such that for any new triplet of objects ( x , x + , x − ) {\displaystyle (x,x^{+},x^{-})} , it obeys f ( x , x + ) > f ( x , x − ) {\displaystyle f(x,x^{+})>f(x,x^{-})} (contrastive learning). This setup assumes a weaker form of supervision than in regression, because instead of providing an exact measure of similarity, one only has to provide the relative order of similarity. For this reason, ranking-based similarity learning is easier to apply in real large-scale applications. Locality sensitive hashing (LSH) Hashes input items so that similar items map to the same "buckets" in memory with high probability (the number of buckets being much smaller than the universe of possible input items). It is often applied in nearest neighbor search on large-scale high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases. A common approach for learning similarity is to model the similarity function as a bilinear form. For example, in the case of ranking similarity learning, one aims to learn a matrix W that parametrizes the similarity function f W ( x , z ) = x T W z {\displaystyle f_{W}(x,z)=x^{T}Wz} . When data is abundant, a common approach is to learn a siamese network – a deep network model with parameter sharing. == Metric learning == Similarity learning is closely related to distance metric learning. Metric learning is the task of learning a distance function over objects. A metric or distance function has to obey four axioms: non-negativity, identity of indiscernibles, symmetry and subadditivity (or the triangle inequality). In practice, metric learning algorithms ignore the condition of identity of indiscernibles and learn a pseudo-metric. When the objects x i {\displaystyle x_{i}} are vectors in R d {\displaystyle R^{d}} , then any matrix W {\displaystyle W} in the symmetric positive semi-definite cone S + d {\displaystyle S_{+}^{d}} defines a distance pseudo-metric of the space of x through the form D W ( x 1 , x 2 ) 2 = ( x 1 − x 2 ) ⊤ W ( x 1 − x 2 ) {\displaystyle D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }W(x_{1}-x_{2})} . When W {\displaystyle W} is a symmetric positive definite matrix, D W {\displaystyle D_{W}} is a metric. Moreover, as any symmetric positive semi-definite matrix W ∈ S + d {\displaystyle W\in S_{+}^{d}} can be decomposed as W = L ⊤ L {\displaystyle W=L^{\top }L} where L ∈ R e × d {\displaystyle L\in R^{e\times d}} and e ≥ r a n k ( W ) {\displaystyle e\geq rank(W)} , the distance function D W {\displaystyle D_{W}} can be rewritten equivalently D W ( x 1 , x 2 ) 2 = ( x 1 − x 2 ) ⊤ L ⊤ L ( x 1 − x 2 ) = ‖ L ( x 1 − x 2 ) ‖ 2 2 {\displaystyle D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }L^{\top }L(x_{1}-x_{2})=\|L(x_{1}-x_{2})\|_{2}^{2}} . The distance D W ( x 1 , x 2 ) 2 = ‖ x 1 ′ − x 2 ′ ‖ 2 2 {\displaystyle D_{W}(x_{1},x_{2})^{2}=\|x_{1}'-x_{2}'\|_{2}^{2}} corresponds to the Euclidean distance between the transformed feature vectors x 1 ′ = L x 1 {\displaystyle x_{1}'=Lx_{1}} and x 2 ′ = L x 2 {\displaystyle x_{2}'=Lx_{2}} . Many formulations for metric learning have been proposed. Some well-known approaches for metric learning include learning from relative comparisons, which is based on the triplet loss, large margin nearest neighbor, and information theoretic metric learning (ITML). In statistics, the covariance matrix of the data is sometimes used to define a distance metric called Mahalanobis distance. == Applications == Similarity learning is used in information retrieval for learning to rank, in face verification or face identification, and in recommendation systems. Also, many machine learning approaches rely on some metric. This includes unsupervised learning such as clustering, which groups together close or similar objects. It also includes supervised approaches like K-nearest neighbor algorithm which rely on labels of nearby objects to decide on the label of a new object. Metric learning has been proposed as a preprocessing step for many of these approaches. == Scalability == Metric and similarity learning scale quadratically with the dimension of the input space, as can easily see when the learned metric has a bilinear form f W ( x , z ) = x T W z {\displaystyle f_{W}(x,z)=x^{T}Wz} . Scaling to higher dimensions can be achieved by enforcing a sparseness structure over the matrix model, as done with HDSL, and with COMET. == Software == metric-learn is a free software Python library which offers efficient implementations of several supervised and weakly-supervised similarity and metric learning algorithms. The API of metric-learn is compatible with scikit-learn. OpenMetricLearning is a Python framework to train and validate the models producing high-quality embeddings. == Further information == For further information on this topic, see the surveys on metric and similarity learning by Bellet et al. and Kulis.

Clanker

Clanker is a derogatory term for robots and artificial intelligence (AI) software. The term has been used in Star Wars media, first appearing in the franchise's 2005 video game Star Wars: Republic Commando. In 2025, the term became widely used to express hatred or distaste for machines ranging from delivery robots to large language models. This trend has been attributed to anxiety around the negative societal effects of AI. == In science fiction == The term has been previously used in science fiction literature, first appearing in a 1958 article by William Tenn in which he uses it to describe robots from science fiction films like Metropolis. The Star Wars franchise began using the term as a slur against droids in the 2005 video game Star Wars: Republic Commando before being prominently used in the animated series Star Wars: The Clone Wars, which follows a galaxy-wide war between the Galactic Republic's clone troopers and the Confederacy of Independent Systems' battle droids. In Star Wars media, robots—more commonly known as droids—are routinely depicted as the subjects of discrimination. For example, in the original Star Wars film, C-3PO and R2-D2 are abducted by Jawas and sold to the family of Luke Skywalker. When visiting a cantina in Mos Eisley, both droids are refused service by the bartender, who remarks that "We don't serve their kind." In Star Wars lore, the term clanker had entered use by the time of the franchise's High Republic Era and became prominent during the Clone Wars, in which clone troopers regularly use the phrase against battle droids. == AI backlash == The growing popularity of the term clanker reflects an increase in direct contact between people and AI systems. On sidewalks, delivery robots impede mobility and cause safety issues. In digital spaces, cybersecurity experts have raised concerns about the rising number of bots online, which now make up a large portion of internet traffic. A 2025 report estimated that about one in five social media accounts are automated. The term is also a reaction to AI advocacy from industrialists like Elon Musk and Sam Altman, who have championed the integration of AI into nearly every aspect of modern life. This includes efforts by major companies and startups alike, such as Amazon's development of humanoid robots to replace human workers in service industries. Such initiatives have further fueled public skepticism, reinforcing the association of clanker with unease over automation and the displacement of human roles. A global survey conducted by the research firm Gartner in December 2023 found that 64% of customers would prefer companies to avoid using AI in customer service, with another 53% stating they would consider switching to a different company if they discovered AI was handling their service interactions. Another report by Ernst & Young, published in July 2025, found that 42% of employees across Europe are worried that the use of AI in the workplace may threaten their employment. Criticism has also been directed at the technology itself. Some of the backlash stems from concerns about the resource consumption of AI systems, their frequent reliance on copyrighted material without consent, and questions about the intentions of the corporations behind them. There are also concerns about the potential cognitive effects of relying heavily on AI. A study, authored by researchers at Microsoft and Carnegie Mellon University, warns that regular dependence on AI may leave users mentally unprepared for real-world problem solving, likening the effect to cognitive atrophy. In June 2025, United States Senator Ruben Gallego tweeted that his "new bill makes sure you don't have to talk to a clanker if you don't want to", referring to proposed legislation that would require call centers to disclose their use of automated customer service agents to callers in the United States and offer the option to switch to a human representative. == Analysis == Linguist Adam Aleksic has described clanker as an evolution of racial slurs that anthropomorphize robotic systems. Internet memes incorporating the term often reference historical discrimination against marginalized groups such as African Americans. Based on the work of linguist Geoffrey Nunberg, American news website Axios has argued that clanker is merely a derogatory word, rather than a slur, because it does not perpetuate social inequities. NPR has noted the irony that the word robot was coined by Karel Čapek for his 1920 science-fiction play R.U.R. as a similar criticism of industrialization forcing workers to become devoid of their humanity. Aleksic has observed that robot can be further traced to the Proto-Slavic noun orbъ, which means 'slave'. While other science fiction media include pejoratives for androids and robots, such as skinjob and toaster from the Blade Runner and Battlestar Galactica franchises, respectively, clanker is believed to have gained popularity because its usage is intuitive and flexible. Whereas AI slop describes low-quality output from artificial intelligence, clanker belittles the underlying computer systems.