Example-based machine translation

Example-based machine translation

Example-based machine translation (EBMT) is a method of machine translation often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base at run-time. It is essentially a translation by analogy and can be viewed as an implementation of a case-based reasoning approach to machine learning. == Translation by analogy == At the foundation of example-based machine translation is the idea of translation by analogy. When applied to the process of human translation, the idea that translation takes place by analogy is a rejection of the idea that people translate sentences by doing deep linguistic analysis. Instead, it is founded on the belief that people translate by first decomposing a sentence into certain phrases, then by translating these phrases, and finally by properly composing these fragments into one long sentence. Phrasal translations are translated by analogy to previous translations. The principle of translation by analogy is encoded to example-based machine translation through the example translations that are used to train such a system. Other approaches to machine translation, including statistical machine translation, also use bilingual corpora to learn the process of translation. == History == Example-based machine translation was first suggested by Makoto Nagao in 1984. He pointed out that it is especially adapted to translation between two totally different languages, such as English and Japanese. In this case, one sentence can be translated into several well-structured sentences in another language, therefore, it is no use to do the deep linguistic analysis characteristic of rule-based machine translation. == Example == Example-based machine translation systems are trained from bilingual parallel corpora containing sentence pairs like the example shown in the table above. Sentence pairs contain sentences in one language with their translations into another. The particular example shows an example of a minimal pair, meaning that the sentences vary by just one element. These sentences make it simple to learn translations of portions of a sentence. For example, an example-based machine translation system would learn three units of translation from the above example: How much is that X ? corresponds to Ano X wa ikura desu ka. red umbrella corresponds to akai kasa small camera corresponds to chiisai kamera Composing these units can be used to produce novel translations in the future. For example, if we have been trained using some text containing the sentences: President Kennedy was shot dead during the parade. and The convict escaped on July 15th., then we could translate the sentence The convict was shot dead during the parade. by substituting the appropriate parts of the sentences. == Phrasal verbs == Example-based machine translation is best suited for sub-language phenomena like phrasal verbs. Phrasal verbs have highly context-dependent meanings. They are common in English, where they comprise a verb followed by an adverb and/or a preposition, which are called the particle to the verb. Phrasal verbs produce specialized context-specific meanings that may not be derived from the meaning of the constituents. There is almost always an ambiguity during word-to-word translation from source to the target language. As an example, consider the phrasal verb "put on" and its Hindustani translation. It may be used in any of the following ways: Ram put on the lights. (Switched on) (Hindustani translation: Jalana) Ram put on a cap. (Wear) (Hindustani translation: Pahenna)

Human image synthesis

Human image synthesis is technology that can be applied to make believable and even photorealistic renditions of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films using computer generated imagery have featured synthetic images of human-like characters digitally composited onto the real or other simulated film material. Towards the end of the 2010s deep learning artificial intelligence has been applied to synthesize images and video that look like humans, without need for human assistance, once the training phase has been completed, whereas the old school 7D-route required massive amounts of human work. == Timeline of human image synthesis == In 1971 Henri Gouraud made the first CG geometry capture and representation of a human face. Modeling was his wife Sylvie Gouraud. The 3D model was a simple wire-frame model and he applied the Gouraud shader he is most known for to produce the first known representation of human-likeness on computer. The 1972 short film A Computer Animated Hand by Edwin Catmull and Fred Parke was the first time that computer-generated imagery was used in film to simulate moving human appearance. The film featured a computer simulated hand and face (watch film here). The 1976 film Futureworld reused parts of A Computer Animated Hand on the big screen. The 1983 music video for song Musique Non-Stop by German band Kraftwerk aired in 1986. Created by the artist Rebecca Allen, it features non-realistic looking, but clearly recognizable computer simulations of the band members. The 1994 film The Crow was the first film production to make use of digital compositing of a computer simulated representation of a face onto scenes filmed using a body double. Necessity was the muse as the actor Brandon Lee portraying the protagonist was tragically killed accidentally on-stage. In 1999 Paul Debevec et al. of USC captured the reflectance field of a human face with their first version of a light stage. They presented their method at the SIGGRAPH 2000 In 2003 audience debut of photo realistic human-likenesses in the 2003 films The Matrix Reloaded in the burly brawl sequence where up-to-100 Agent Smiths fight Neo and in The Matrix Revolutions where at the start of the end showdown Agent Smith's cheekbone gets punched in by Neo leaving the digital look-alike unnaturally unhurt. The Matrix Revolutions bonus DVD documents and depicts the process in some detail and the techniques used, including facial motion capture and limbal motion capture, and projection onto models. In 2003 The Animatrix: Final Flight of the Osiris a state-of-the-art want-to-be human likenesses not quite fooling the watcher made by Square Pictures. In 2003 digital likeness of Tobey Maguire was made for movies Spider-man 2 and Spider-man 3 by Sony Pictures Imageworks. In 2005 the Face of the Future project was an established. by the University of St Andrews and Perception Lab, funded by the EPSRC. The website contains a "Face Transformer", which enables users to transform their face into any ethnicity and age as well as the ability to transform their face into a painting (in the style of either Sandro Botticelli or Amedeo Modigliani). This process is achieved by combining the user's photograph with an average face. In 2009 Debevec et al. presented new digital likenesses, made by Image Metrics, this time of actress Emily O'Brien whose reflectance was captured with the USC light stage 5 Motion looks fairly convincing contrasted to the clunky run in the Animatrix: Final Flight of the Osiris which was state-of-the-art in 2003 if photorealism was the intention of the animators. In 2009 a digital look-alike of a younger Arnold Schwarzenegger was made for the movie Terminator Salvation though the end result was critiqued as unconvincing. Facial geometry was acquired from a 1984 mold of Schwarzenegger. In 2010 Walt Disney Pictures released a sci-fi sequel entitled Tron: Legacy with a digitally rejuvenated digital look-alike of actor Jeff Bridges playing the antagonist CLU. In SIGGGRAPH 2013 Activision and USC presented a real-time "Digital Ira" a digital face look-alike of Ari Shapiro, an ICT USC research scientist, utilizing the USC light stage X by Ghosh et al. for both reflectance field and motion capture. The end result both precomputed and real-time rendering with the modernest game GPU shown here and looks fairly realistic. In 2014 The Presidential Portrait by USC Institute for Creative Technologies in conjunction with the Smithsonian Institution was made using the latest USC mobile light stage wherein President Barack Obama had his geometry, textures and reflectance captured. In 2014 Ian Goodfellow et al. presented the principles of a generative adversarial network. GANs made the headlines in early 2018 with the deepfakes controversies. For the 2015 film Furious 7 a digital look-alike of actor Paul Walker who died in an accident during the filming was done by Weta Digital to enable the completion of the film. In 2016 techniques which allow near real-time counterfeiting of facial expressions in existing 2D video have been believably demonstrated. In 2016 a digital look-alike of Peter Cushing was made for the Rogue One film where its appearance would appear to be of same age as the actor was during the filming of the original 1977 Star Wars film. In SIGGRAPH 2017 an audio driven digital look-alike of upper torso of Barack Obama was presented by researchers from University of Washington. It was driven only by a voice track as source data for the animation after the training phase to acquire lip sync and wider facial information from training material consisting 2D videos with audio had been completed. Late 2017 and early 2018 saw the surfacing of the deepfakes controversy where porn videos were doctored using deep machine learning so that the face of the actress was replaced by the software's opinion of what another persons face would look like in the same pose and lighting. In 2018 Game Developers Conference Epic Games and Tencent Games demonstrated "Siren", a digital look-alike of the actress Bingjie Jiang. It was made possible with the following technologies: CubicMotion's computer vision system, 3Lateral's facial rigging system and Vicon's motion capture system. The demonstration ran in near real time at 60 frames per second in the Unreal Engine 4. In 2018 at the World Internet Conference in Wuzhen the Xinhua News Agency presented two digital look-alikes made to the resemblance of its real news anchors Qiu Hao (Chinese language) and Zhang Zhao (English language). The digital look-alikes were made in conjunction with Sogou. Neither the speech synthesis used nor the gesturing of the digital look-alike anchors were good enough to deceive the watcher to mistake them for real humans imaged with a TV camera. In September 2018 Google added "involuntary synthetic pornographic imagery" to its ban list, allowing anyone to request the search engine block results that falsely depict them as "nude or in a sexually explicit situation." In February 2019 Nvidia open sources StyleGAN, a novel generative adversarial network. Right after this Phillip Wang made the website ThisPersonDoesNotExist.com with StyleGAN to demonstrate that unlimited amounts of often photo-realistic looking facial portraits of no-one can be made automatically using a GAN. Nvidia's StyleGAN was presented in a not yet peer reviewed paper in late 2018. At the June 2019 CVPR the MIT CSAIL presented a system titled "Speech2Face: Learning the Face Behind a Voice" that synthesizes likely faces based on just a recording of a voice. It was trained with massive amounts of video of people speaking. Since 1 July 2019 Virginia has criminalized the sale and dissemination of unauthorized synthetic pornography, but not the manufacture., as § 18.2–386.2 titled 'Unlawful dissemination or sale of images of another; penalty.' became part of the Code of Virginia. The law text states: "Any person who, with the intent to coerce, harass, or intimidate, maliciously disseminates or sells any videographic or still image created by any means whatsoever that depicts another person who is totally nude, or in a state of undress so as to expose the genitals, pubic area, buttocks, or female breast, where such person knows or has reason to know that he is not licensed or authorized to disseminate or sell such videographic or still image is guilty of a Class 1 misdemeanor.". The identical bills were House Bill 2678 presented by Delegate Marcus Simon to the Virginia House of Delegates on 14 January 2019 and three-day later an identical Senate bill 1736 was introduced to the Senate of Virginia by Senator Adam Ebbin. Since 1 September 2019 Texas senate bill SB 751 amendments to the election code came into effect, giving candidates in elections a 30-day protection period to the elections during which making and distributing digital look-alikes or synthetic fakes of the candidates is an offense. Th

Bulletin (service)

Bulletin was an online newsletter platform launched by Facebook on July 6, 2021, that allows notable writers to make announcements directly to their subscribers. Its competitors included Substack, of which Bulletin was called a "near-clone." Writers participating in the platform's launch included Malcolm Gladwell, Mitch Albom, Tan France, Jessica Yellin, Jane Wells, Erin Andrews and Dorie Greenspan. Facebook CEO Mark Zuckerberg stated that Bulletin represented the first time that the company had "built a project that is directly for journalists and individual writers." In October 2022 Meta announced the shutdown of Bulletin. The platform went into read only mode in January 2023 and became unavailable in April 2023. == History == Facebook announced Bulletin as its online newsletter platform on June 29, 2021. and launched by the company on July 6, 2021. Facebook CEO Mark Zuckerberg touted the service by saying that Bulletin represented the first time that the company had "built a project that is directly for journalists and individual writers." Writers participating in the platform's launch included Malcolm Gladwell, Mitch Albom, Tan France, Jessica Yellin, Jane Wells, Erin Andrews and Dorie Greenspan. == Reception == Unlike competitor such as Substack, Facebook indicated upon service's launch that it would not take a cut of subscription fees of writers using that platform. According to Washington Post technology writer Will Oremus, the move was criticized by those who viewed it as a form of predatory pricing intended by Facebook to force those competitors out of business. Sandeep Vaheesan, legal director of the think tank Open Markets, called for the government to reexamine predatory pricing as a violation of antitrust law, saying, "We want companies to compete by making better products, investing in new equipment and tech — not purely relying on their financial advantages to capture market share."

Push technology

Push technology, also known as server push, is a communication method where the communication is initiated by a server rather than a client. This approach is different from the "pull" method where the communication is initiated by a client. In push technology, clients can express their preferences for certain types of information or data, typically through a process known as the publish–subscribe model. In this model, a client "subscribes" to specific information channels hosted by a server. When new content becomes available on these channels, the server automatically sends, or "pushes," this information to the subscribed client. Under certain conditions, such as restrictive security policies that block incoming HTTP requests, push technology is sometimes simulated using a technique called polling. In these cases, the client periodically checks with the server to see if new information is available, rather than receiving automatic updates. == General use == Synchronous conferencing and instant messaging are examples of push services. Chat messages and sometimes files are pushed to the user as soon as they are received by the messaging service. Both decentralized peer-to-peer programs (such as WASTE) and centralized programs (such as IRC or XMPP) allow pushing files, which means the sender initiates the data transfer rather than the recipient. Email may also be a push system: SMTP is a push protocol (see Push e-mail). However, the last step—from mail server to desktop computer—typically uses a pull protocol like POP3 or IMAP. Modern e-mail clients make this step seem instantaneous by repeatedly polling the mail server, frequently checking it for new mail. The IMAP protocol includes the IDLE command, which allows the server to tell the client when new messages arrive. The original BlackBerry was the first popular example of push-email in a wireless context. Another example is the PointCast Network, which was widely covered in the 1990s. It delivered news and stock market data as a screensaver. Both Netscape and Microsoft integrated push technology through the Channel Definition Format (CDF) into their software at the height of the browser wars, but it was never very popular. CDF faded away and was removed from the browsers of the time, replaced in the 2000s with RSS (a pull system.) Other uses of push-enabled web applications include software updates distribution ("push updates"), market data distribution (stock tickers), online chat/messaging systems (webchat), auctions, online betting and gaming, sport results, monitoring consoles, and sensor network monitoring. == Examples == === Web push === The Web push proposal of the Internet Engineering Task Force is a simple protocol using HTTP version 2 to deliver real-time events, such as incoming calls or messages, which can be delivered (or "pushed") in a timely fashion. The protocol consolidates all real-time events into a single session which ensures more efficient use of network and radio resources. A single service consolidates all events, distributing those events to applications as they arrive. This requires just one session, avoiding duplicated overhead costs. Web Notifications are part of the W3C standard and define an API for end-user notifications. A notification allows alerting the user of an event, such as the delivery of an email, outside the context of a web page. As part of this standard, Push API is fully implemented in Chrome, Firefox, and Edge, and partially implemented in Safari as of February 2023. === HTTP server push === HTTP server push (also known as HTTP streaming) is a mechanism for sending unsolicited (asynchronous) data from a web server to a web browser. HTTP server push can be achieved through any of several mechanisms. As a part of HTML5 the Web Socket API allows a web server and client to communicate over a full-duplex TCP connection. Generally, the web server does not terminate a connection after response data has been served to a client. The web server leaves the connection open so that if an event occurs (for example, a change in internal data which needs to be reported to one or multiple clients), it can be sent out immediately; otherwise, the event would have to be queued until the client's next request is received. Most web servers offer this functionality via CGI (e.g., Non-Parsed Headers scripts on Apache HTTP Server). The underlying mechanism for this approach is chunked transfer encoding. Another mechanism is related to a special MIME type called multipart/x-mixed-replace, which was introduced by Netscape in 1995. Web browsers interpret this as a document that changes whenever the server pushes a new version to the client. It is still supported by Firefox, Opera, and Safari today, but it is ignored by Internet Explorer and is only partially supported by Chrome. It can be applied to HTML documents, and also for streaming images in webcam applications. The WHATWG Web Applications 1.0 proposal includes a mechanism to push content to the client. On September 1, 2006, the Opera web browser implemented this new experimental system in a feature called "Server-Sent Events". It is now part of the HTML5 standard. === Pushlet === In this technique, the server takes advantage of persistent HTTP connections, leaving the response perpetually "open" (i.e., the server never terminates the response), effectively fooling the browser to remain in "loading" mode after the initial page load could be considered complete. The server then periodically sends snippets of JavaScript to update the content of the page, thereby achieving push capability. By using this technique, the client doesn't need Java applets or other plug-ins in order to keep an open connection to the server; the client is automatically notified about new events, pushed by the server. One serious drawback to this method, however, is the lack of control the server has over the browser timing out; a page refresh is always necessary if a timeout occurs on the browser end. === Long polling === Long polling is itself not a true push; long polling is a variation of the traditional polling technique, but it allows emulating a push mechanism under circumstances where a real push is not possible, such as sites with security policies that require rejection of incoming HTTP requests. With long polling, the client requests to get more information from the server exactly as in normal polling, but with the expectation that the server may not respond immediately. If the server has no new information for the client when the poll is received, then instead of sending an empty response, the server holds the request open and waits for response information to become available. Once it does have new information, the server immediately sends an HTTP response to the client, completing the open HTTP request. Upon receipt of the server response, the client often immediately issues another server request. In this way the usual response latency (the time between when the information first becomes available and the next client request) otherwise associated with polling clients is eliminated. For example, BOSH is a popular, long-lived HTTP technique used as a long-polling alternative to a continuous TCP connection when such a connection is difficult or impossible to employ directly (e.g., in a web browser); it is also an underlying technology in the XMPP, which Apple uses for its iCloud push support. === Flash XML Socket relays === This technique, used by chat applications, makes use of the XML Socket object in a single-pixel Adobe Flash movie. Under the control of JavaScript, the client establishes a TCP connection to a unidirectional relay on the server. The relay server does not read anything from this socket; instead, it immediately sends the client a unique identifier. Next, the client makes an HTTP request to the web server, including this identifier with it. The web application can then push messages addressed to the client to a local interface of the relay server, which relays them over the Flash socket. The advantage of this approach is that it appreciates the natural read-write asymmetry that is typical of many web applications, including chat, and as a consequence it offers high efficiency. Since it does not accept data on outgoing sockets, the relay server does not need to poll outgoing TCP connections at all, making it possible to hold open tens of thousands of concurrent connections. In this model, the limit to scale is the TCP stack of the underlying server operating system. === Reliable Group Data Delivery (RGDD) === In services such as cloud computing, to increase reliability and availability of data, it is usually pushed (replicated) to several machines. For example, the Hadoop Distributed File System (HDFS) makes 2 extra copies of any object stored. RGDD focuses on efficiently casting an object from one location to many while saving bandwidth by sending minimal number of copies (only one in the best case) of

Bare machine

In information technology, a bare machine (or bare-metal computer) is a computer which has no operating system. The software executed by a bare machine, commonly called a bare metal program or bare metal application, is designed to interact directly with hardware. Bare machines are widely used in embedded systems, particularly in cases where resources are limited or high performance is required. == Bare machine computing == Bare Machine Computing is a computing paradigm in which application software runs directly on a bare machine as a single, stand-alone executable, without an operating system or device drivers. The application software has direct access to hardware resources, and there is typically no distinction between user and kernel mode. It is self-managed software that boots, loads and runs without using any other software components. Bare metal programs are typically written in a close-to-hardware language such as C or assembly language. == Advantages == Typically, a bare-metal application will run faster, use less memory and be more power efficient than an equivalent program that relies on an operating system, due to the inherent overhead imposed by system calls. For example, hardware inputs and outputs are directly accessible to bare metal software, whereas they must usually be accessed through system calls when using an OS. It has no OS and therefore has no OS-related vulnerabilities. == Disadvantages == Bare metal applications typically require more effort to develop because operating system services such as memory management and task scheduling are not available. Debugging a bare-metal program may be complicated by factors such as: Lack of a standard output. The target machine may differ from the hardware used for program development (e.g., emulator, simulator). This forces setting up a way to load the bare-metal program onto the target (flashing), start the program execution and access the target resources. == Examples == === Early computers === Early computers, such as the PDP-11, allowed programmers to load a program, supplied in machine code, to RAM. The resulting operation of the program could be monitored by lights, and output derived from magnetic tape, print devices, or storage. Amdahl UTS's performance improves by 25% when run on bare metal without VM, the company said in 1986. === Embedded systems === Bare machine programming is a common practice in embedded systems, in which microcontrollers or microprocessors boot directly into monolithic, single-purpose software without loading an operating system. Such embedded software can vary in structure. For example, one such program paradigm, known as foreground-background or superloop architecture, consists of an infinite main loop in which each task is executed sequentially and must voluntarily return control back to the loop. The loop runs these cooperative background processes that are not time-critical, while interrupt service routines momentarily interrupt the loop to handle time-critical foreground tasks.

Oculus Quill

Quill is a painting and animation software for virtual reality. It runs on Microsoft Windows with Oculus Rift headsets. It is used to create 3D paintings and animated cartoons. Quill was released on November 29, 2016, on the Oculus Store. Theater Elsewhere(formerly Quill Theater), an application for viewing creations made in Quill, was later made available following the release of the Oculus Quest. In September 2021, Facebook, now known as Meta Platforms, and the owner of Oculus, sold Quill to its original creator, who continues to develop and support the app. == Development == Quill was originally developed by Oculus Story Studio as an internal tool for the creative needs of the studio's project Dear Angelica directed by Saschka Unseld along with its art-director Wesley Allsbrook. == Controls == The software works on Oculus Rift utilizing its 6DoF motion controllers. Users can paint in 3D space using their hands naturally, and animate those paintings with keyframes. They can also capture videos and photos of their creations. == Reception == Dear Angelica, a VR story fully painted in Quill, was nominated for an Emmy Award in 2017.

Commercial skipping

Commercial skipping is a feature of some digital video recorders that makes it possible to automatically skip commercials in recorded programs. This feature created controversy, with major television networks and movie studios claiming it violates copyright and should be banned. == History == After the video cassette recorder (VCR) became popular in the 1980s, the television industry began studying the impact of users fast forwarding through commercials. Advertising agencies fought the trend by making them more entertaining. For many years, video recorders manufactured for the Japanese market have been able to skip advertisements automatically, which is done by detecting when foreign language audio overdub tracks provided for many programmes go silent, as advertisements were broadcast with a single language only. The first digital video recorder (DVR) with a built-in commercial skipping feature was ReplayTV with its "4000 Series" and "5000 Series" units. In 2002, the main television networks and movie studios sued ReplayTV, claiming that skipping advertisements during replay violates copyright. Later, five owners of ReplayTV represented by Electronic Frontier Foundation and attorneys Ira Rothken and Richard Wiebe countersued, asking the federal judge to uphold consumers' rights to record TV shows and skip commercials, claiming that features like commercial skipping help parents protect their kids from excessive consumerism. ReplayTV ended up filing for bankruptcy in 2003 after fighting a copyright infringement suit over the ReplayTV's ability to skip commercials. === Commercial skipping software === In addition to the DVR devices which existed in the private market since the late 1990s, towards the mid-2000s, due to the significant advances in home computers, Home theater PCs started gaining popularity in the private market and many users began using their Home theater PCs in their living room for entertainment purposes. Following this, many DVR programs were developed, including popular programs such as Windows Media Center, which contained all of the features of the DVR devices in addition to advanced features such as HDTV and the use of Multiple TV Tuner Cards. Some independent developers began developing independent software capable of skipping the commercial segments when playing recorded videos, and permanently removing the commercial segments from recorded video files. By 2014, many DVR programs such as Windows Media Center, SageTV and MythTV had the capability to skip commercials segments in recorded TV broadcasts after installing third-party add-ons such as DVRMSToolbox, Comskip and ShowAnalyzer, which use various advanced techniques to locate the commercial segments in the video files and save their locations to text files. The text files can also be fed into programs such as MEncoder or DVRMSToolboxGUI which can delete the commercial segments from the recorded video files. A few third-party tools such as MCEBuddy automate detection and removal/marking of commercials. One of the weaknesses of commercial skippers is that, operating automatically, they may misidentify program material as a commercial. Some programs like MCEBuddy provide the ability to fine-tune commercial detection for groups of files (e.g. by channel or country) and provide tools to manually fine-tune commercial segments for individual files. In May 2012, the US Dish Network began offering a DVR with what it calls AutoHop. The device would automatically skip commercials when displaying programming that the viewer had previously recorded with the PrimeTime Anytime feature. It does not skip ads on any live programs. US broadcasters were angered at the news, and FOX embarked on legal action. Most, but not all, of Fox's claims were dismissed; ultimately an agreement was reached whereby AutoHop would only become available for Fox stations seven days after a program is transmitted; terms of the settlement were not disclosed. == The future of TV advertisements == The introduction of digital video recorders and services with skipping and fast-forward capabilities enables viewers to avoid viewing interruptive advertisements in recorded programs, either manually or automatically. While advertising separate to television shows can be skipped, advertising in TV shows themselves ("product placement") cannot be skipped. Streaming services such as Hulu show shorter advertisements with a countdown timer and tailored to the viewers interests, asking interactive questions like "Is this ad relevant to you?".